Targeted gene demethylation in plants

ABSTRACT

The present disclosure relates to the use of recombinant proteins for inducing epigenetic modifications at specific loci, as well as to methods of using these recombinant proteins for modulating the expression of genes in plants.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.16/480,623, filed internationally on Jan. 22, 2018, which is a U.S.National Phase patent application under 35 U.S.C. § 371 of InternationalApplication No. PCT/US2018/014741, filed Jan. 22, 2018, which claims thebenefit of U.S. Provisional Application No. 62/450,929, filed on Jan.26, 2017, and U.S. Provisional Application No. 62/547,053, filed on Aug.17, 2017, the disclosures of each of which are incorporated herein byreference in their entirety.

REFERENCE TO AN ELECTRONIC SEQUENCE LISTING

The content of the electronic sequence listing(262232001301SUBSEQLIST.xml; Size: 473,222 bytes; and Date of Creation:Dec. 1, 2022) is herein incorporated by reference in its entirety.

FIELD

The present disclosure relates to the use of recombinant proteins forinducing epigenetic modifications at specific loci, as well as tomethods of using these recombinant proteins for modulating theexpression of genes in plants.

BACKGROUND

Epigenetic marks are enzyme-mediated chemical modifications of DNA andof its associated chromatin proteins. Although epigenetic marks do notalter the primary sequence of DNA, they do contain heritable informationand play key roles in regulating genome function. Such modifications,including cytosine methylation, posttranslational modifications ofhistone tails and the histone core, and the positioning of nucleosomes(histone octamers wrapped with DNA), influence the transcriptional stateand other functional aspects of chromatin. For example, methylation ofDNA and certain residues on the histone H3 N-terminal tail, such as H3lysine 9 (H3K9), are important for transcriptional gene silencing andthe formation of heterochromatin.

Different pathways involved in epigenetic gene expression regulationhave been previously described, and include histone deacetylation, H3K27and H3K9 methylation, H3K4 demethylation, and DNA methylation ofpromoters. In plants, proteins generally do not link the recognition ofa specific DNA sequence with the establishment of an epigenetic state.Thus, endogenous plant epigenetic regulators generally cannot be usedfor epigenetic modifications of specific genes or transgenes in plants.However, the ability to specifically induce epigenetic modifications ata target locus is desirable as this may allow for controlled expressionof the locus (e.g. control over gene expression). Moreover, there iscurrently no robust method for selectively demethylating and activatingthe expression of plant genes.

Accordingly, a need exists for epigenetic regulators that are capable ofbeing targeted to specific loci to induce epigenetic modifications atthose loci in plants.

BRIEF SUMMARY

In one aspect, the present disclosure relates to a method for reducingmethylation of a target nucleic acid in a plant, including: (a)providing a plant including a recombinant polypeptide including aDNA-binding domain and a TET1 polypeptide or fragment thereof; and (b)growing the plant under conditions whereby the recombinant polypeptideis targeted to the target nucleic acid, thereby reducing methylation ofthe target nucleic acid. In some embodiments, the DNA-binding domainincludes a zinc finger domain. In some embodiments, the zinc fingerdomain includes two, three, four, five, six, seven, eight, or nine zincfingers. In some embodiments, the zinc finger domain is a zinc fingerarray. In some embodiments, the zinc finger domain is selected from thegroup of a Cys2His2 (C2H2) zinc finger domain, a CCCH zinc fingerdomain, a multi-cysteine zinc finger domain, and a zinc binuclearcluster domain. In some embodiments, the DNA-binding domain is selectedfrom the group of a TAL effector targeting domain, a helix-turn-helixfamily DNA-binding domain, a basic domain, a ribbon-helix-helix domain,a TBP domain, a barrel dimer domain, a real homology domain, a BAHdomain, a SANT domain, a Chromodomain, a Tudor domain, a Bromodomain, aPHD domain, a WD40 domain, and a MBD domain. In some embodiments, theDNA-binding domain includes a TAL effector targeting domain. In someembodiments, the DNA-binding domain includes three C2H2 zinc fingerdomains. In some embodiments that may be combined with any of thepreceding embodiments, the TET1 polypeptide includes an amino acidsequence that is at least 80% identical to SEQ ID NO: 8. In someembodiments that may be combined with any of the preceding embodiments,the target nucleic acid is an endogenous nucleic acid. In someembodiments that may be combined with any of the preceding embodiments,the target nucleic acid is a heterologous nucleic acid. In someembodiments that may be combined with any of the preceding embodiments,expression of the target nucleic acid is activated as compared to acorresponding control nucleic acid.

In another aspect, the present disclosure provides a recombinant nucleicacid including a plant promoter and which encodes a recombinantpolypeptide including a DNA-binding domain and a TET1 polypeptide orfragment thereof. The present disclosure further relates to expressionvectors including the recombinant nucleic acid of the precedingembodiment, and a host cell including the expression vector of thepreceding embodiment. The present disclosure also relates to arecombinant plant including the recombinant nucleic acid and/orpolypeptide of the preceding embodiments.

In another aspect, the present disclosure provides a plant havingreduced methylation of a target nucleic acid as a consequence of themethod of any one of the preceding embodiments, as well as a progenyplant of the plant of the preceding embodiment. In some embodiments, theprogeny plant has reduced methylation of the target nucleic acid anddoes not include the recombinant nucleic acid and/or polypeptide.

In another aspect, the present disclosure provides a method for reducingmethylation of a target nucleic acid in a plant, including: (a)providing a plant including a recombinant polypeptide including anuclease-deficient CAS9 polypeptide (dCAS9) or fragment thereof and aTET1 polypeptide or fragment thereof; and a crRNA and a tracrRNA, orfusions thereof; and (b) growing the plant under conditions whereby therecombinant polypeptide is targeted to the target nucleic acid, therebyreducing methylation of the target nucleic acid. In some embodiments,the TET1 polypeptide includes an amino acid sequence that is at least80% identical to SEQ ID NO: 8. In some embodiments that may be combinedwith any of the preceding embodiments, the target nucleic acid is anendogenous nucleic acid. In some embodiments that may be combined withany of the preceding embodiments, the target nucleic acid is aheterologous nucleic acid. In some embodiments that may be combined withany of the preceding embodiments, expression of the target nucleic acidis activated as compared to a corresponding control nucleic acid.

In another aspect, the present disclosure provides a recombinant nucleicacid including a plant promoter and which encodes a recombinantpolypeptide including a nuclease-deficient CAS9 polypeptide (dCAS9) orfragment thereof and a TET1 polypeptide or fragment thereof. The presentdisclosure further relates to expression vectors including therecombinant nucleic acid of the preceding embodiment, and a host cellincluding the expression vector of the preceding embodiment. The presentdisclosure also relates to a recombinant plant including the recombinantnucleic acid and/or polypeptide of the preceding embodiments.

In another aspect, the present disclosure provides a plant havingreduced methylation of a target nucleic acid as a consequence of themethod of any one of the preceding embodiments, as well as a progenyplant of the plant of the preceding embodiment. In some embodiments, theprogeny plant has reduced methylation of the target nucleic acid anddoes not include the recombinant nucleic acid and/or polypeptide.

In another aspect, the present disclosure provides a method for reducingmethylation of a target nucleic acid in a plant, the method including:(a) providing a plant including a recombinant TET1-like polypeptide orfragment thereof; and (b) growing the plant under conditions whereby therecombinant polypeptide is targeted to the target nucleic acid, therebyreducing methylation of the target nucleic acid.

In another aspect, the present disclosure provides a method for reducingmethylation of a target nucleic acid in a plant, the method including:(a) providing a plant including a recombinant nucleic acid encoding aTET1-like protein or fragment thereof; and (b) growing the plant underconditions where the recombinant nucleic acid is expressed and where therecombinant polypeptide is targeted to the one or more target nucleicacids, thereby reducing methylation of the target nucleic acid.

In another aspect, the present disclosure provides a method for reducingmethylation of a target nucleic acid in a plant, the method including:(a) providing a plant including a recombinant nucleic acid encoding aTET1-like protein or fragment thereof; and a crRNA and tracrRNA, orfusions thereof, and where the plant expresses a dCAS9 protein; and (b)growing the plant under conditions where the recombinant nucleic acid isexpressed and where the recombinant polypeptide is targeted to the oneor more target nucleic acids, thereby reducing methylation of the targetnucleic acid. In some embodiments, the recombinant polypeptide includesa dCAS9 protein or fragment thereof. In some embodiments, therecombinant polypeptide includes an MS2 protein or fragment thereof. Insome embodiments, the recombinant polypeptide includes an scFV antibodyor fragment thereof.

In another aspect, the present disclosure provides a method for reducingmethylation of a target nucleic acid in a plant, including: (a)providing a plant including: a first recombinant polypeptide including anuclease-deficient CAS9 polypeptide (dCAS9) or fragment thereof and amultimerized epitope; a second recombinant polypeptide including a TET1polypeptide or fragment thereof and an affinity polypeptide thatspecifically binds to the epitope; a crRNA and a tracrRNA, or fusionsthereof; and (b) growing the plant under conditions whereby the firstand second recombinant polypeptides are targeted to the one or moretarget nucleic acids, thereby reducing methylation of the target nucleicacid. In some embodiments, the dCAS9 polypeptide has an amino acidsequence that is at least 80% identical, at least 85% identical, atleast 90% identical, at least 95% identical, or 100% identical to SEQ IDNO: 125. In some embodiments that may be combined with any of thepreceding embodiments, the multimerized epitope includes a GCN4 epitope.In some embodiments, the multimerized epitope includes about 2 to about10 copies of a GCN4 epitope. In some embodiments that may be combinedwith any of the preceding embodiments, the first polypeptide includesone or more linkers that link polypeptide units in the recombinantpolypeptide. In some embodiments that may be combined with any of thepreceding embodiments, the first polypeptide includes a nuclearlocalization signal (NLS). In some embodiments that may be combined withany of the preceding embodiments, the TET1 polypeptide includes an aminoacid sequence that is at least 80% identical to SEQ ID NO: 8. In someembodiments that may be combined with any of the preceding embodiments,the affinity polypeptide is an antibody. In some embodiments, theantibody is an scFv antibody. In some embodiments, the antibody includesan amino acid sequence that is at least 80% identical to SEQ ID NO: 132.In some embodiments that may be combined with any of the precedingembodiments, the second polypeptide includes one or more linkers thatlink polypeptide units in the recombinant polypeptide. In someembodiments that may be combined with any of the preceding embodiments,the crRNA and the tracrRNA are fused together, thereby forming a guideRNA (gRNA). In some embodiments that may be combined with any of thepreceding embodiments, expression of the nucleic acid is increased inthe range of about 2-fold to about 100-fold as compared to acorresponding control. In some embodiments that may be combined with anyof the preceding embodiments, expression of the nucleic acid isdecreased in the range of about 2-fold to about 100-fold as compared toa corresponding control.

In another aspect, the present disclosure provides a recombinant vectorincluding: a first nucleic acid sequence including a plant promoter andthat encodes a recombinant polypeptide including a nuclease-deficientCAS9 polypeptide (dCAS9) or fragment thereof and a multimerized epitope;a second nucleic acid sequence including a plant promoter and thatencodes a recombinant polypeptide including a TET1 polypeptide orfragment thereof and an affinity polypeptide that specifically binds tothe epitope; and a third nucleic acid sequence including a promoter andthat encodes a crRNA and a tracrRNA, or fusions thereof.

Also provided are host cells including the vector or one or more of therecombinant polypeptides or nucleic acids of any of the precedingembodiments, and a recombinant plant including the vector or one or moreof the recombinant polypeptides or nucleic acids of any of the precedingembodiments.

In another aspect, the present disclosure provides a plant havingreduced methylation of a target nucleic acid as a consequence of themethod of any of the preceding embodiments. Also provided is a progenyplant of the plant of the preceding embodiment. In some embodiments, theprogeny plant has reduced methylation of the target nucleic acid anddoes not include the recombinant polypeptides.

In another aspect, the present disclosure provides a method for reducingmethylation of a target nucleic acid in a plant, including: (a)providing a plant including a recombinant polypeptide including aDNA-binding domain and a methylcytosine dioxygenase polypeptide thatincludes the amino acid sequence HXD, where X is any amino acid; and (b)growing the plant under conditions whereby the recombinant polypeptideis targeted to the target nucleic acid, thereby reducing methylation ofthe target nucleic acid. In some embodiments, the DNA-binding domaincomprises a zinc finger domain. In some embodiments, the zinc fingerdomain includes two, three, four, five, six, seven, eight, or nine zincfingers. In some embodiments, the zinc finger domain is a zinc fingerarray. In some embodiments, the zinc finger domain is selected from thegroup consisting of a Cys2His2 (C2H2) zinc finger domain, a CCCH zincfinger domain, a multi-cysteine zinc finger domain, and a zinc binuclearcluster domain. In some embodiments, the DNA-binding domain is selectedfrom the group consisting of a TAL effector targeting domain, ahelix-turn-helix family DNA-binding domain, a basic domain, aribbon-helix-helix domain, a TBP domain, a barrel dimer domain, a realhomology domain, a BAH domain, a SANT domain, a Chromodomain, a Tudordomain, a Bromodomain, a PHD domain, a WD40 domain, and a MBD domain. Insome embodiments, the DNA-binding domain includes a TAL effectortargeting domain. In some embodiments, the DNA-binding domain includesthree C2H2 zinc finger domains. In some embodiments that may be combinedwith any of the preceding embodiments, the methylcytosine dioxygenasepolypeptide is a TET polypeptide. In some embodiments, the TETpolypeptide is a TET1 polypeptide. In some embodiments, the TET1polypeptide includes the catalytic domain of TET1. In some embodiments,the TET1 polypeptide includes an amino acid sequence that is at least80% identical to SEQ ID NO: 8. In some embodiments that may be combinedwith any of the preceding embodiments, the target nucleic acid is anendogenous nucleic acid. In some embodiments that may be combined withany of the preceding embodiments, the target nucleic acid is aheterologous nucleic acid. In some embodiments that may be combined withany of the preceding embodiments, expression of the target nucleic acidis activated as compared to a corresponding control nucleic acid.

In another aspect, the present disclosure provides a recombinant nucleicacid including a plant promoter and which encodes a recombinantpolypeptide including a DNA-binding domain and a methylcytosinedioxygenase polypeptide that includes the amino acid sequence HXD, whereX is any amino acid.

In another aspect, the present disclosure provides a method for reducingmethylation of a target nucleic acid in a plant, including: (a)providing a plant including: a first recombinant polypeptide including anuclease-deficient CAS9 polypeptide (dCAS9) or fragment thereof and amultimerized epitope; a second recombinant polypeptide including amethylcytosine dioxygenase polypeptide that includes the amino acidsequence HXD, where X is any amino acid, and an affinity polypeptidethat specifically binds to the epitope; and a crRNA and a tracrRNA, orfusions thereof; and (b) growing the plant under conditions whereby thefirst and second recombinant polypeptides are targeted to the one ormore target nucleic acids, thereby reducing methylation of the targetnucleic acid. In some embodiments, the dCAS9 polypeptide has an aminoacid sequence that is at least 80% identical, at least 85% identical, atleast 90% identical, at least 95% identical, or 100% identical to SEQ IDNO: 125. In some embodiments that may be combined with any of thepreceding embodiments, the multimerized epitope includes a GCN4 epitope.In some embodiments, the multimerized epitope includes about 2 to about10 copies of a GCN4 epitope. In some embodiments that may be combinedwith any of the preceding embodiments, the first polypeptide includesone or more linkers that link polypeptide units in the recombinantpolypeptide. In some embodiments that may be combined with any of thepreceding embodiments, the first polypeptide includes a nuclearlocalization signal (NLS). In some embodiments that may be combined withany of the preceding embodiments, the methylcytosine dioxygenasepolypeptide is a TET polypeptide. In some embodiments, the TETpolypeptide is a TET1 polypeptide. In some embodiments, the TET1polypeptide includes the catalytic domain of TET1. In some embodiments,the TET1 polypeptide includes an amino acid sequence that is at least80% identical to SEQ ID NO: 8. In some embodiments that may be combinedwith any of the preceding embodiments, the affinity polypeptide is anantibody. In some embodiments, the antibody is an scFv antibody. In someembodiments, the antibody includes an amino acid sequence that is atleast 80% identical to SEQ ID NO: 132. In some embodiments that may becombined with any of the preceding embodiments, the second polypeptideincludes one or more linkers that link polypeptide units in therecombinant polypeptide. In some embodiments that may be combined withany of the preceding embodiments, the crRNA and the tracrRNA are fusedtogether, thereby forming a guide RNA (gRNA). In some embodiments thatmay be combined with any of the preceding embodiments, expression of thenucleic acid is increased in the range of about 2-fold to about 100-foldas compared to a corresponding control. In some embodiments that may becombined with any of the preceding embodiments, expression of thenucleic acid is decreased in the range of about 2-fold to about 100-foldas compared to a corresponding control.

In another aspect, the present disclosure provides a recombinant vectorincluding: a first nucleic acid sequence including a plant promoter andthat encodes a recombinant polypeptide including a nuclease-deficientCAS9 polypeptide (dCAS9) or fragment thereof and a multimerized epitope;a second nucleic acid sequence including a plant promoter and thatencodes a recombinant polypeptide including a methylcytosine dioxygenasepolypeptide that includes the amino acid sequence HXD, where X is anyamino acid, and an affinity polypeptide that specifically binds to theepitope; and a third nucleic acid sequence including a promoter and thatencodes a crRNA and a tracrRNA, or fusions thereof.

Also provided are host cells including the vector or one or more of therecombinant polypeptides or nucleic acids of any of the precedingembodiments, and a recombinant plant including the vector or one or moreof the recombinant polypeptides or nucleic acids of any of the precedingembodiments.

In another aspect, the present disclosure provides a plant havingreduced methylation of a target nucleic acid as a consequence of themethod of any of the preceding embodiments. Also provided is a progenyplant of the plant of the preceding embodiment. In some embodiments, theprogeny plant has reduced methylation of the target nucleic acid anddoes not include the recombinant polypeptides.

DESCRIPTION OF THE FIGURES

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawings will be provided by the office upon request and paymentof the necessary fee.

FIG. 1 illustrates flowering time in Col-0 wild-type plants, fwa mutantplants, and T1 transgenic plants carrying the ZF108_TET1-CD construct inthe Col-0 background.

FIG. 2 illustrates results of CHOP-PCR in Col-0 wild-type plants, fwamutant plants, and T1 transgenic plants carrying the ZF108_TET1-CDconstruct in the Col-0 background. DNA of different lines as shown inthe figure (ZF108_TET1-CD lines are labelled #1, 6, 7, 9, 12, 15, and16) were digested with the DNA methylation sensitive enzyme McrBC. Aregion of the FWA promoter was analyzed. As a control region, the DNAmethylated gene body of another gene was analyzed. The height of eachbar represents the ratio of the amount of PCR product from the McrBCdigested sample to the amount of PCR product from the undigested sample.

FIG. 3 illustrates Whole Genome Bisulfite Sequencing results. DNAmethylation of four independent transgenic lines carrying theZF108_TET1-CD construct that showed the late flowering phenotype wereanalysed by BS-seq. Methylation at different contexts (CG, CHG and CHH,where H is C, T, or A) is shown for a wild-type Col-0 plant and arepresentative ZF108_TET1-CD line. The FWA promoter region is marked ina red box.

FIG. 4 illustrates a zoomed-out view of the Whole Genome BisulfiteSequencing results presented in FIG. 3 . DNA methylation of fourindependent transgenic lines carrying the ZF108_TET1-CD construct thatshowed the late flowering phenotype were analysed by BS-seq. Methylationat different contexts (CG, CHG and CHH, where H is C, T, or A) is shownfor a wild-type Col-0 plant and a representative ZF108_TET1-CD line. TheFWA promoter region is marked in a red box.

FIG. 5 illustrates RNA-seq analysis of Col-0 wild-type plants, fwamutant plants, and T1 transgenic plants carrying the ZF108_TET1-CDconstruct. Four independent ZF108_TET1-CD lines, fwa-4 plants, andwild-type Col-0 control plants were analysed by RNA-seq. RPKM foldchange between wild-type Col-0 and ZF108_TET1-CD lines, or betweenwild-type Col-0 and fwa-4, is presented for the FWA gene and the controlhousekeeping genes PP2A and IPP2. The fold change value in expression ofeach gene in the indicated line as compared to Col-0 wild-type plants isindicated on top of each bar.

FIG. 6 illustrates the structure of exemplary fusion constructs used ina modified CRISPR-targeting scheme involving the use of MS2 proteins.

FIG. 7 illustrates how various crRNA sequences map to the FWA locus (SEQID NO: 195).

FIG. 8 illustrates the structure of exemplary fusion constructs used ina modified CRISPR-targeting scheme involving the use of SunTagconstructs.

FIG. 9 illustrates a schematic of a SunTag targeting system that wasused successfully to demethylate the FWA promoter.

FIG. 10 illustrates illustrates Whole Genome Bisulfite Sequencingresults. DNA methylation of a late flowering transgenic line thatcarries thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(SunTag22aa-TET1) construct was analyzed by BS-seq. Methylation atdifferent contexts (CG, CHG and CHH, where H is C, T, or A) is shown fora wild-type Col-0 plant and the SunTag-TET1 line. The FWA promoterregion is marked in a red box.

FIG. 11 illustrates a zoomed-out view of the Whole Genome BisulfiteSequencing results presented in FIG. 10 . DNA methylation of a lateflowering transgenic line that carries thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(SunTag22aa-TET1) construct was analyzed by BS-seq. Methylation atdifferent contexts (CG, CHG and CHH, where H is C, T, or A) is shown fora wild-type Col-0 plant and the SunTag-TET1 line. The FWA promoterregion is marked in a red box.

FIG. 12 illustrates Whole Genome Bisulfite Sequencing results. DNAmethylation of a late flowering transgenic line that carries thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS(SunTag14aa-TET1) construct was analyzed by BS-seq. Methylation atdifferent contexts (CG, CHG and CHH, where H is C, T, or A) is shown fora wild-type Col-0 plant and the SunTag-TET1 line. The FWA promoterregion is marked in a red box.

FIG. 13 illustrates a zoomed-out view of the Whole Genome BisulfiteSequencing results presented in FIG. 12 . DNA methylation of a lateflowering transgenic line that carries thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS(SunTag14aa-TET1) construct was analyzed by BS-seq. Methylation atdifferent contexts (CG, CHG and CHH, where H is C, T, or A) is shown fora wild-type Col-0 plant and the SunTag-TET1 line. The FWA promoterregion is marked in a red box.

FIG. 14 illustrates RNA-seq analysis of Col-0 wild-type plants and oneindependent T1 line for thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(SunTag22aa-TET1-1) and two independent T1 lines for thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS(SunTag14aa-TET1-1 and SunTag14aa-TET1-2) construct. SunTag22aa-TET1,SunTag14aa-TET1 and wild-type Col-0 control plants were analysed byRNA-seq. RPKM fold change between wild-type Col-0 and SunTag22aa-TET1-1,SunTag14aa-TET1-1 or SunTag14aa-TET1-1 is presented for the FWA gene andthe control housekeeping genes PP2A and IPP2. The fold change value inexpression of each gene in the indicated line as compared to Col-0wild-type plants is indicated on top of each bar.

FIG. 15 illustrates a schematic of a SunTag targeting system that wasused successfully to demethylate the CACTA1 promoter.

FIG. 16 illustrates quantitative real-time PCR results in a bar graphshowing relative expression of CACTA1 over IPP2 in Col-0 and two T1plants containing theCACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(SunTagCACTA1g2-22aa) transgene.

FIG. 17A, FIG. 17B, and FIG. 17C illustrates Whole Genome BisulfiteSequencing results. DNA methylation of two independent transgenic linesthat carry theCACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(SunTagCACTA1g2-22aa) transgene were analyzed by BS-seq. Methylationlevels in different contexts (CG, CHG and CHH, where H is C, T, or A)are shown for a wild-type Col-0 plant and the SunTag22aaCACTA1g2-22aalines. A gray arrow indicates the gRNA binding site in the promoterregion of CACTA1. A zoom in of the targeted region is shown (right).

FIG. 18 illustrates the methylation levels in the region comprising 200bp upstream and downstream of the gRNA binding site in a bar graph forCol-0 and two T1 plants containingCACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(SunTagCACTA1g2-22aa) transgene.

FIG. 19 illustrates the genome-wide CG, CHG and CHH methylation levelsin Col-0 and two T1 plants containing theCACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(SunTagCACTA1g2-22aa) transgene. Percent methylation is depicted on theY-axis.

FIG. 20 illustrates a schematic of a SunTag targeting system that wasused successfully to demethylate the ROS1 promoter.

FIG. 21 illustrates quantitative real-time PCR results in a bar graphshowing relative expression of ROS1 over IPP2 in two Col-0 and onetransgenic plant containing theROS1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(ROS1g2 SunTag22aa TET1cd) transgene.

FIG. 22 illustrates Whole Genome Bisulfite Sequencing results. DNAmethylation of a wild-type Col-0 plant and a transgenic line thatcarries theROS1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(ROS1g2 SunTag22aa TET1cd) construct was analyzed by BS-seq. Methylationlevels in different contexts (CG, CHG and CHH, where H is C, T, or A)are shown. The ROS1 promoter region is marked in a red box.

FIG. 23 illustrates a zoomed-out view of the Whole Genome BisulfiteSequencing results presented in FIG. 22 . DNA methylation of a wild-typeCol-0 plant and a transgenic line that carries theROS1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(ROS1g2 SunTag22aa TET1cd) construct was analyzed by BS-seq. Methylationlevels in different contexts (CG, CHG and CHH, where H is C, T, or A)are shown. The ROS1 promoter region is marked in a red box.

FIG. 24 illustrates RNA-seq analysis of one wild-type Col-0 plant, andT1 transgenic plants carrying either the pUBQ10::ZF1CACTA1_3×Flag_TET1CDor the pUBQ10::ZF2CACTA1_3×Flag_TET1CD transgene in a bar graph.

FIG. 25A, FIG. 25B, and FIG. 25C illustrates Whole Genome BisulfiteSequencing results. DNA methylation of one wild-type Col-0 plant and twoindependent transgenic lines that carry either thepUBQ10::ZF1CACTA1_3×Flag_TET1CD or the pUBQ10::ZF2CACTA1_3×Flag_TET1CDtransgene were analyzed by BS-seq. Methylation levels at differentcontexts (CG, CHG and CHH, where H is C, T, or A) are shown. A red arrowindicates the ZF1CACTA1 binding site and a purple arrow indicates theZF2CACTA1 binding site in the promoter region of CACTA1. A zoom in ofthe targeted region is shown (right).

FIG. 26 illustrates the methylation levels in the region comprising 200bp upstream and downstream of either the ZF1CACTA1 or ZF2CACTA1 bindingsite in a bar graph for Col-0 and a T1 plant containing either thepUBQ10::ZF1CACTA1_3×Flag_TET1CD or the pUBQ10::ZF2CACTA1_3×Flag_TET1CDtransgene.

FIG. 27 illustrates the genome-wide CG, CHG and CHH methylation levelsin one wild-type Col-0 plant and a T1 plant containing either thepUBQ10::ZF1CACTA1_3×Flag_TET1CD or the pUBQ10::ZF2CACTA1_3×Flag_TET1CDtransgene. Percent methylation is depicted on the Y-axis.

FIG. 28 illustrates a metaplot showing CG, CHG, and CHH methylationlevels over all protein coding genes and TEs in one wild-type Col-0plant and a T1 plant containing either thepUBQ10::ZF1CACTA1_3×Flag_TET1CD or the pUBQ10::ZF2CACTA1_3×Flag_TET1CDtransgene.

FIG. 29 illustrates quantitative real-time PCR results in a bar graphshowing relative expression of CACTA1 over IPP2 in one wild-type Col-0plant and a T2 plant that has retained thepUBQ10::ZF1CACTA1_3×Flag_TET1CD transgene (+) and a T2 plant that hashad the transgene segregated away (−).

FIG. 30A, FIG. 30B, and FIG. 30C illustrates Whole Genome BisulfiteSequencing results. DNA methylation of one wild-type Col-0 plant and T2plants that have either retained the pUBQ10::ZF1CACTA1_3×Flag_TET1CD orthe pUBQ10::ZF2CACTA1_3×Flag_TET1CD transgene (+), or have had thetransgene segregated away (−) were analyzed by BS-seq. Methylationlevels at different contexts (CG, CHG and CHH, where H is C, T, or A)are shown. A red arrow indicates the ZF1 binding site and a blue arrowindicates the ZF2 binding site in the promoter region of CACTA1. A zoomin of the targeted region is shown (right).

FIG. 31 illustrates the methylation levels in the region comprising 200bp upstream and downstream of the ZF1CACTA1 binding site in a bar graphfor one wild-type Col-0 plant and a T2 plant that has retained thepUBQ10::ZF1CACTA1_3×Flag_TET1CD transgene (+) and a T2 plant that hashad the transgene segregated away (−).

FIG. 32 illustrates the genome-wide CG, CHG and CHH methylation levelsin one wild-type Col-0 plant and T2 plants that have either retained thepUBQ10::ZF1CACTA1_3×Flag_TET1CD or the pUBQ10::ZF2CACTA1_3×Flag_TET1CDtransgene (+), or have had the transgene segregated away (−). Percentmethylation is depicted on the Y-axis.

FIG. 33 illustrates a metaplot showing CG, CHG, and CHH methylationlevels over all protein coding genes and TEs in one wild-type Col-0plant and T2 plants that have either retained thepUBQ10::ZF1CACTA1_3×Flag_TET1CD or the pUBQ10::ZF2CACTA1_3×Flag_TET1CDtransgene (+), or have had the transgene segregated away (−).

FIG. 34 illustrates RNA-seq analysis of one wild-type Col-0 plant andtwo independent T1 transgenic plants carrying thepUBQ10::ZF1ROS1_3×Flag_TET1cd transgene in a bar graph. RPKM values areindicated.

FIG. 35A, FIG. 35B, and FIG. 35C illustrates Whole Genome BisulfiteSequencing results. DNA methylation of one wild-type Col-0 plant and twoindependent T1 transgenic plants carrying thepUBQ10::ZF1ROS1_3×Flag_TET1cd transgene were analyzed by BS-seq.Methylation levels at different contexts (CG, CHG and CHH, where H is C,T, or A) are shown. A blue arrow indicates the ZF1 binding site in thepromoter region of ROS1. A zoom in of the targeted region is shown(right).

FIG. 36 illustrates the methylation levels in the region comprising 200bp upstream and downstream of the ZF1ROS1 binding site in a bar graph ofone wild-type Col-0 plant and two independent T1 transgenic plantscarrying the pUBQ10::ZF1ROS1_3×Flag_TET1cd transgene.

FIG. 37 illustrates the genome-wide CG, CHG and CHH methylation levelsin Col-0 and two independent T1 transgenic plants carrying thepUBQ10::ZF1ROS1_3×Flag_TET1cd transgene. Percent methylation is depictedon the Y-axis.

FIG. 38 illustrates a metaplot showing CG, CHG, and CHH methylationlevels over all protein coding genes and TEs of one wild-type Col-0plant and two independent T1 transgenic plants carrying thepUBQ10::ZF1ROS1_3×Flag_TET1cd transgene.

FIG. 39 illustrates RNA-seq analysis of five Col-0 wild-type plants,fwa-4, two independent T1 lines for thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(SunTag FWAg4-22aa-TET1) transgene, and two independent T1 lines for thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS(SunTag FWAg4-14aa-TET1) transgene displayed in a bar graph.

FIG. 40 illustrates Whole Genome Bisulfite Sequencing results. DNAmethylation of one wild-type Col-0 plant and a late flowering transgenicline that carries thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(SunTagFWAg4-22aa) transgene was analyzed by BS-seq. Methylation levelsof different contexts (CG, CHG and CHH, where H is C, T, or A) areshown. A gray arrow indicates the gRNA4 binding site in the promoterregion of FWA.

FIG. 41 illustrates a zoomed-out view of the Whole Genome BisulfiteSequencing results presented in FIG. 40 . DNA methylation of onewild-type Col-0 plant and a late flowering transgenic line that carriesthegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS(SunTagFWAg4-22aa) construct was analyzed by BS-seq. Methylation levelsof different contexts (CG, CHG and CHH, where H is C, T, or A) areshown. A gray arrow indicates the gRNA4 binding site in the promoterregion of FWA.

FIG. 42 illustrates the flowering time of Col-0, fwa-4, and thesegregating populations of T2 plants that have arisen from T1 plantscontaining either thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene.

FIG. 43 illustrates Whole Genome Bisulfite Sequencing results. DNAmethylation of T2 plants that have either retained thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSor thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene (+) or have had the transgenes segregated away (−) wereanalyzed by BS-seq. Methylation levels of different contexts (CG, CHGand CHH, where H is C, T, or A) are shown. A gray arrow indicates thegRNA4 binding site in the promoter region of FWA.

FIG. 44 illustrates Whole Genome Bisulfite Sequencing results. DNAmethylation of T2 plants that have either retained thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCStransgene (+) or have had the transgene segregated away (−) was analyzedby BS-seq. Methylation levels of different contexts (CG, CHG and CHH,where H is C, T, or A) are shown. A gray arrow indicates the gRNA4binding site in the promoter region of FWA.

FIG. 45A, FIG. 45B, and FIG. 45C illustrates a zoomed-out view of theWhole Genome Bisulfite Sequencing results presented in FIG. 43 and FIG.44 . DNA methylation of T2 plants that have either retained thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSor thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene (+), or have had the transgene segregated away (−) wereanalyzed by BS-seq. Methylation levels of different contexts (CG, CHGand CHH, where H is C, T, or A) are shown. A gray arrow indicates thegRNA4 binding site in the promoter region of FWA.

FIG. 46 illustrates the genome-wide CG methylation levels in Col-0, T1and T2 plants that contain either thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSor thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene (+), and T2 plants where either thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSor thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene has been segregated away in the T2 (−).

FIG. 47 illustrates the genome-wide CHG and CHH methylation levels inone wild-type Col-0 plant and T1 plants that contain either thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSor thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene.

FIG. 48 illustrates the genome-wide CHG and CHH methylation levels inone wild-type Col-0 plant and T2 plants that contain either thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSor thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene (+), or plants that had segregated away the transgenes (−).

FIG. 49 illustrates the genome-wide CG, CHG and CHH methylation levelsin one wild-type Col-0 plant and a T2 plant that contains thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCStransgene (+) or a T2 plant where thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCStransgene has been segregated away in the T2 (−).

FIG. 50A-FIG. 50B illustrate flowering time data. FIG. 50A illustratesthe flowering time of Col-0, fwa-4, and a population of T1 plants withZF108-TET1cd. FIG. 50B illustrates the flowering time data of Col-0,fwa-4, T3 plants from 3 independent lines containingpUBQ10_ZF108_3×Flag_YPet and T3 plants from 3 independent lines thathave either retained the pUBQ10::ZF108_3×Flag_TET1-CD transgene (+) orplants where the pUBQ10::ZF108_3×Flag_TET1-CD transgene was segregatedaway (−).

FIG. 51A-FIG. 51B illustrate RNA-seq analysis data. FIG. 51A illustratesRNA-seq data of one wild-type Col-0 plant, an fwa-4 plant, and fourindependent T1 plants expressing the pUBQ10::ZF108_3×Flag_TET1-CDtransgene with a bar graph of RPKM values (RPKM+1). FIG. 51B illustratesRNA-seq data of four replicates of Col-0 wild-type plants, fourreplicates from T3 plants from two independent lines containingpUBQ10::ZF108_3×Flag_YPet, and four replicates from T3 plants from twoindependent lines containing the pUBQ10::ZF108_3×Flag_TET1-CD transgenein a bar graph of RPKM values (RPKM+1).

FIG. 52 illustrates a scatterplot of RNA-seq data comparing geneexpression of ZF108-TET1cd lines and ZF108-YPet lines. Values werecalculated using four biological replicates of two independent lines forZF108-TET1cd and ZF108-YPet. Gray dots indicate non-differentiallyexpressed genes. Blue dots indicate differentially expressed genes. A4-fold change and FDR less than 0.05 was used as a cutoff. FWAexpression is highlighted in red.

FIG. 53 illustrates Whole Genome Bisulfite Sequencing results. DNAmethylation levels of one wild-type Col-0 plant and late flowering T3transgenic line that have either retained thepUBQ10::ZF108_3×Flag_TET1-CD transgene (+) or where the transgene hadbeen segregated away (−) were analyzed by BS-seq. Methylation levels atdifferent contexts (CG, CHG and CHH, where H is C, T, or A) are shown.The black triangles indicate the ZF108 binding sites in the promoterregion of FWA.

FIG. 54 illustrates Whole Genome Bisulfite Sequencing results. DNAmethylation levels of one wild-type Col-0 plant and late flowering T3transgenic lines that have either retained thepUBQ10::ZF108_3×Flag_TET1-CD transgene (+) or where the transgene hadbeen segregated away (−) were analyzed by BS-seq. Methylation levels ofdifferent contexts (CG, CHG and CHH, where H is C, T, or A) are shownfor a wild-type Col-0 plant and plants that have either retained thepUBQ10::ZF108_3×Flag_TET1-CD transgene or where the transgene had beensegregated away. The black triangles indicate the ZF108 binding sites inthe promoter region of FWA.

FIG. 55 illustrates a zoomed-out view of the Whole Genome BisulfiteSequencing results presented in FIG. 53 and FIG. 54 . DNA methylationlevels of one wild-type Col-0 plant and two late flowering T3 transgeniclines that have either retained the pUBQ10::ZF108_3×Flag_TET1-CDtransgene (+) or where the transgene had been segregated away (−) wereanalyzed by BS-seq. Methylation levels at different contexts (CG, CHGand CHH, where H is C, T, or A) are shown. The blue triangle indicatesthe ZF108 binding sites in the promoter region of FWA.

FIG. 56 illustrates the genome-wide CG methylation levels in Col-0plants, four independent T1 plants containing thepUBQ10::ZF108_3×Flag_TET1-CD transgene, a T3 plant that retained thepUBQ10::ZF108_3×Flag_TET1-CD transgene (+) and a T3 plant that has hadthe transgene segregated away (−). Percent methylation is depicted onthe Y-axis.

FIG. 57 illustrates the genome-wide CHG and CHH methylation levels inCol-0 plants and four independent T1 plants containing thepUBQ10::ZF108_3×Flag_TET1-CD transgene. Percent methylation is depictedon the Y-axis.

FIG. 58 illustrates the genome-wide CHG and CHH methylation levels inone wild-type Col-0 plant, a T3 plant that retained thepUBQ10::ZF108_3×Flag_TET1-CD transgene (+) and a T3 plant that has hadthe transgene segregated away (−). Percent methylation is depicted onthe Y-axis.

FIG. 59 illustrates the genome-wide CG, CHG and CHH methylation levelsin one wild-type Col-0 plant and a plant that retained thepUBQ10::ZF108_3×Flag_TET1-CD transgene (+) and a T3 plant that has hadthe transgene segregated away (−) from another T3 line. Percentmethylation is depicted on the Y-axis.

FIG. 60 illustrates a metaplot showing CG, CHG, and CHH methylationlevels over all protein coding genes and TEs in one wild-type Col-0plant, a T3 plant that retained the pUBQ10::ZF108_3×Flag_TET1-CDtransgene (+) and a T3 plant that has had the transgene segregated away(−).

FIG. 61 illustrates a schematic of a SunTag targeting system without aspecific guide RNA for expression in Arabidopsis.

FIG. 62 illustrates the flowering time of Col-0, fwa-4 and T1 plantscontaining either theNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene.

FIG. 63 illustrates Whole Genome Bisulfite Sequencing results. DNAmethylation levels of one wild-type Col-0 plant and two independent T1plants containing theNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene were analyzed by BS-seq. Methylation levels of differentcontexts (CG, CHG and CHH, where H is C, T, or A) over an area thatincludes the FWA promoter are shown.

FIG. 64 illustrates Whole Genome Bisulfite Sequencing results. DNAmethylation levels of one wild-type Col-0 plant and two independent T1plants containing theNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCStransgene were analyzed by BS-seq. Methylation levels of differentcontexts (CG, CHG and CHH, where H is C, T, or A) over an area thatincludes the CACTA1 promoter are shown.

FIG. 65 illustrates the genome-wide CG, CHG and CHH methylation levelsof one wild-type Col-0 plant and two independent T1 plants containingtheNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene.

FIG. 66A, FIG. 66B, and FIG. 66C illustrates an alignment of thecatalytic domain of human TET1_(SEQ ID NO: 8), TET2 (SEQ ID NO: 192),and TET3 (SEQ ID NO: 194). Yellow highlighting shows the Cys-richdomain: likely to chelate two or more Zn2+ ions via nine conserved Cysresidues and one His residue. It has been postulated to be part of aDNA-binding surface that might help in target recognition (Pastor et al,2013, Nature Rev Mol Cell Biol, June; 14(6): 341-356). Grey indicatesthe invariant P causing a kink, a unique feature of TET family. Purplehighlighting indicates the dioxygenase domain. Pink indicates theHis-Xaa-Asp (where Xaa is any amino acid) (“HXD”, where X is any aminoacid) and C-term His: involved in coordinating Fe2+. The blue R residuebinds to oxoglutarate via a salt bridge. Red lining above amino acidsequences indicates the CTD-like region within the DSBH domain. Purpleindicates the active sites.

FIG. 67 illustrates an alignment of the Cys-rich domain of TET1 (SEQ IDNO: 196), TET2 (SEQ ID NO: 197), and TET3 (SEQ ID NO: 198).

FIG. 68A-FIG. 68B illustrates an alignment of double-stranded B-helix(DSBH) fold/Dioxygenase Domain of TET1 (SEQ ID NO: 200), TET2 (SEQ IDNO: 199), and TET3 (SEQ ID NO: 201).

DETAILED DESCRIPTION Overview

The following description is presented to enable a person of ordinaryskill in the art to make and use the various embodiments. Descriptionsof specific devices, techniques, methods, and applications are providedonly as examples. Various modifications to the examples described hereinwill be readily apparent to those of ordinary skill in the art, and thegeneral principles defined herein may be applied to other examples andapplications without departing from the spirit and scope of the variousembodiments. Thus, the various embodiments are not intended to belimited to the examples described herein and shown, but are to beaccorded the scope consistent with the claims.

The present disclosure relates to the use of recombinant proteins forinducing epigenetic modifications at specific loci, as well as tomethods of using these recombinant proteins for modulating theexpression of genes in plants.

Specifically, the present disclosure relates to the compositions andmethods for targeting recombinant TET proteins (e.g. TET1 proteins) tospecific nucleic acids in plants to reduce methylation of the targetnucleic acid.

The present disclosure is based, at least in part, on Applicant'sdiscovery that the catalytic domain of a human TET1 protein, whenrecombinantly fused to a DNA-binding domain that targets a specificnucleic acid, could be targeted to and induce DNA de-methylation at thetargeted nucleic acid in plants. The targeted nucleic acid exhibited areduced level of methylation and an increased level of expression ascompared to corresponding controls. This technology could be used toselectively induce DNA de-methylation at targeted nucleic acids inplants and to create novel expression based traits for crop improvement.

Accordingly, the present disclosure provides methods and compositionsfor reducing methylation of a target nucleic acid in a plant bytargeting a TET polypeptide (e.g. TET1 polypeptide) or fragment thereofto a target nucleic acid. Plants may be grown under conditions such thatthe TET polypeptide (e.g. TET1 polypeptide) or fragment thereof istargeted to the target nucleic acid, thereby reducing methylation of thetarget nucleic acid.

In some embodiments, the TET polypeptide (e.g. TET1 polypeptide) orfragment thereof has been engineered to specifically bind different DNAsequences via the introduction of a heterologous DNA-binding domain intothe protein such as, for example, a heterologous zinc finger domain orTAL effector targeting domain. The heterologous DNA-binding domaindirectly facilitates targeting the TET1 polypeptide to the targetnucleic acid to induce de-methylation.

In some embodiments, the TET polypeptide (e.g. TET1 polypeptide) orfragment thereof can be targeted to a specific locus of interest using aCRISPR-CAS9 targeting system. CRISPR-CAS9 systems involve the use of aCRISPR RNA (crRNA), a trans-activating CRISPR RNA (tracrRNA), and a CAS9protein. The crRNA and tracrRNA aid in directing the CAS9 protein to atarget nucleic acid sequence, and these RNA molecules can bespecifically engineered to target specific nucleic acid sequences. Inparticular, certain aspects of the present disclosure involve the use ofa single guide RNA (gRNA) that reconstitutes the function of the crRNAand the tracrRNA. Further, certain aspects of the present disclosureinvolve a CAS9 protein that does not exhibit DNA cleavage activity(dCAS9). As disclosed herein, gRNA molecules may be used to direct thedCAS9 protein to a target nucleic acid sequence. By recombinantly fusinga TET polypeptide (e.g. TET1 polypeptide) or fragment thereof of thepresent disclosure to a dCAS9 protein, use of the CRISPR targetingsystem allows for delivering the TET polypeptide (e.g. TET1 polypeptide)directly to a target nucleic acid.

Accordingly, the present disclosure provides methods forCRISPR-targeting of a TET polypeptide (e.g. TET1 polypeptide) to aspecific locus to reduce methylation of the target locus. The TETpolypeptide (e.g. TET1 polypeptide) may be recombinantly fused to a CAS9protein, such as a nuclease-deficient CAS9 protein. The methods of thepresent disclosure also involve the use of a crRNA and tracrRNA tointeract with the target nucleic acid. The crRNA and tracrRNA directsthe recombinant protein of the present disclosure fused to a CAS9protein to the target nucleic acid, thereby facilitating de-methylationof the target nucleic acid.

Accordingly, certain aspects of the present disclosure relate totargeting a TET-like protein (e.g. TET1-like protein) to a targetnucleic acid. TET-like proteins (e.g. TET1-like proteins), or a fragmentof the full-length coding sequence thereof, may contain a heterologousDNA-binding domain directly facilitates targeting the TET polypeptide(e.g. TET1 polypeptide) to the target nucleic acid to inducede-methylation. TET-like proteins (e.g. TET1-like proteins), or afragment of the full-length coding sequence thereof, may contain aheterologous coding sequence that encodes a protein involved in thetargeting and/or recruitment of the TET polypeptide (e.g. TET1polypeptide) to a target nucleic acid via the CRISPR-CAS9 system. TheTET polypeptide (e.g. TET1 polypeptide) portion of a TET-like protein(e.g. TET1-like protein) may be present in various N-terminal orC-terminal orientations relative to the heterologous coding sequencepresent in a TET-like protein (e.g. TET1-like protein).

The use of the terms “a,” “an,” and “the,” and similar referents in thecontext of describing the disclosure (especially in the context of thefollowing claims) are to be construed to cover both the singular and theplural, unless otherwise indicated herein or clearly contradicted bycontext. The terms “comprising,” “having,” “including,” and “containing”are to be construed as open-ended terms (i.e., meaning “including, butnot limited to,”) unless otherwise noted. Recitation of ranges of valuesherein are merely intended to serve as a shorthand method of referringindividually to each separate value falling within the range, unlessotherwise indicated herein, and each separate value is incorporated intothe specification as if it were individually recited herein. Forexample, if the range 10-15 is disclosed, then 11, 12, 13, and 14 arealso disclosed. All methods described herein can be performed in anysuitable order unless otherwise indicated herein or otherwise clearlycontradicted by context. The use of any and all examples, or exemplarylanguage (e.g., “such as”) provided herein, is intended merely to betterilluminate the embodiments of the disclosure and does not pose alimitation on the scope of the disclosure unless otherwise claimed. Nolanguage in the specification should be construed as indicating anynon-claimed element as essential to the practice of the embodiments ofthe disclosure.

Reference to “about” a value or parameter herein refers to the usualerror range for the respective value readily known to the skilled personin this technical field. Reference to “about” a value or parameterherein includes (and describes) aspects that are directed to that valueor parameter per se. For example, description referring to “about X”includes description of “X.”

It is understood that aspects and embodiments of the present disclosuredescribed herein include “comprising,” “consisting,” and “consistingessentially of” aspects and embodiments.

It is to be understood that one, some, or all of the properties of thevarious embodiments described herein may be combined to form otherembodiments of the present disclosure. These and other aspects of thepresent disclosure will become apparent to one of skill in the art.These and other embodiments of the present disclosure are furtherdescribed by the detailed description that follows.

The terms “isolated” and “purified” as used herein refers to a materialthat is removed from at least one component with which it is naturallyassociated (e.g., removed from its original environment). The term“isolated,” when used in reference to an isolated protein, refers to aprotein that has been removed from the culture medium of the host cellthat expressed the protein. As such an isolated protein is free ofextraneous or unwanted compounds (e.g., nucleic acids, native bacterialor other proteins, etc.).

DNA-Binding Domains

Certain aspects of the present disclosure relate to TET-like proteins(e.g. TET1-like proteins) that have DNA-binding activity. In someembodiments, this DNA-binding activity is achieved through aheterologous DNA-binding domain (e.g. binds with a sequence affinityother than that of a DNA-binding domain that may be present in theendogenous protein). In some embodiments, TET-like proteins (e.g.TET1-like proteins) of the present disclosure contain a DNA-bindingdomain. TET-like proteins (e.g. TET1-like proteins) of the presentdisclosure may contain one DNA binding domain or they may contain morethan one DNA-binding domain. Heterologous DNA-binding domains may berecombinantly fused to a TET protein (e.g. TET1 protein) of the presentdisclosure such that the resulting TET-like protein (e.g. TET1-likeprotein) is then targeted to a specific nucleic acid sequence and caninduce demethylation of the specific nucleic acid sequence.

In some embodiments, the DNA-binding domain is a zinc finger domain. Azinc finger domain generally refers to a DNA-binding protein domain thatcontains zinc fingers, which are small protein structural motifs thatcan coordinate one or more zinc ions to help stabilize their proteinfolding. Zinc fingers were first identified as DNA-binding motifs(Miller et al., 1985), and numerous other variations of them have beencharacterized. Progress has been made that allows the engineering ofDNA-binding proteins that specifically recognize any desired DNAsequence. For example, it was shown that a three-finger zinc fingerprotein could be constructed to block the expression of a human oncogenethat was transformed into a mouse cell line (Choo and Klug, 1994).

Zinc fingers can generally be classified into several differentstructural families and typically function as interaction modules thatbind DNA, RNA, proteins, or small molecules. Suitable zinc fingerdomains of the present disclosure may contain two, three, four, five,six, seven, eight, or nine zinc fingers. Examples of suitable zincfinger domains may include, for example, Cys2His2 (C2H2) zinc fingerdomains, C-x8-C-x5-C-x3-H (CCCH) zinc finger domains, multi-cysteinezinc finger domains, and zinc binuclear cluster domains.

In some embodiments, the DNA-binding domain binds a specific nucleicacid sequence. For example, the DNA-binding domain may bind a sequencethat is at least 5 nucleotides, at least 6 nucleotides, at least 7nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, atleast 20 nucleotides, at least 25 nucleotides, at least 30 nucleotides,at least 35 nucleotides, at least 40 nucleotides, at least 45nucleotides, at least 50 nucleotides, or a high number of nucleotides inlength.

In some embodiments, a recombinant protein of the present disclosurefurther contains two N-terminal CCCH zinc finger domains.

In some embodiments, the zinc finger domain is an engineered zinc fingerarray, such as a C2H2 zinc finger array. Engineered arrays of C2H2 zincfingers can be used to create DNA-binding proteins capable of targetingdesired genomic DNA sequences. Methods of engineering zinc finger arraysare well known in the art, and include, for example, combining smallerzinc fingers of known specificity.

In some embodiments, recombinant proteins of the present disclosure maycontain a DNA-binding domain other than a zinc finger domain. Examplesof such DNA-binding domains may include, for example, TAL (transcriptionactivator-like) effector targeting domains, helix-turn-helix familyDNA-binding domains, basic domains, ribbon-helix-helix domains, TBP(TATA-box binding protein) domains, barrel dimer domains, RHB domains(real homology domain), BAH (bromo-adjacent homology) domains, SANTdomains, Chromodomains, Tudor domains, Bromodomains, PHD domains (planthomeo domain), WD40 domains, and MBD domains (methyl-CpG-bindingdomain).

In some embodiments, the DNA-binding domain is a TAL effector targetingdomain. TAL effectors generally refer to secreted bacterial proteins,such as those secreted by Xanthomonas or Ralstonia bacteria wheninfecting various plant species. Generally, TAL effectors are capable ofbinding promoter sequences in the host plant, and activate theexpression of plant genes that aid in bacterial infection. TAL effectorsrecognize plant DNA sequences through a central repeat targeting domainthat contains a variable number of approximately 34 amino acid repeats.Moreover, TAL effector targeting domains can be engineered to targetspecific DNA sequences. Methods of modifying TAL effector targetingdomains are well known in the art, and described in Bogdanove andVoytas, Science. 2011 Sep. 30; 333(6051):1843-6.

Other DNA-binding domains for use in the methods and compositions of thepresent disclosure will be readily apparent to one of skill in the art,in view of the present disclosure.

CRISPR-CAS9

Certain methods of the present disclosure relate to using a CRISPR-CAS9targeting system to target a TET protein (e.g. TET1 protein) to a targetnucleic acid and induce demethylation of the target nucleic acid.

CRISPR systems naturally use small base-pairing guide RNAs to target andcleave foreign DNA elements in a sequence-specific manner (Wiedenheft etal., 2012). There are diverse CRISPR systems in different organisms thatmay be used to target proteins of the present disclosure to a targetnucleic acid. One of the simplest systems is the type II CRISPR systemfrom Streptococcus pyogenes. Only a single gene encoding the CAS9protein and two RNAs, a mature CRISPR RNA (crRNA) and a partiallycomplementary trans-acting RNA (tracrRNA), are necessary and sufficientfor RNA-guided silencing of foreign DNAs (Jinek et al., 2012).Maturation of crRNA requires tracrRNA and RNase III (Deltcheva et al.,2011). However, this requirement can be bypassed by using an engineeredsmall guide RNA (gRNA) containing a designed hairpin that mimics thetracrRNA-crRNA complex (Jinek et al., 2012). Base pairing between thegRNA and target DNA normally causes double-strand breaks (DSBs) due tothe endonuclease activity of CAS9.

It is known that the endonuclease domains of the CAS9 protein can bemutated to create a programmable RNA-dependent DNA-binding protein(dCAS9) (Qi et al., 2013). The fact that duplex gRNA-dCAS9 binds targetsequences without endonuclease activity has been used to tetherregulatory proteins, such as transcriptional activators or repressors,to promoter regions in order to modify gene expression (Gilbert et al.,2013), and CAS9 transcriptional activators have been used for targetspecificity screening and paired nickases for cooperative genomeengineering (Mali et al., 2013, Nature Biotechnology 31:833-838). Thus,dCAS9 may be used as a modular RNA-guided platform to recruit differentproteins to DNA in a highly specific manner. One of skill in the artwould recognize other RNA-guided DNA binding protein/RNA complexes thatcan be used equivalently to CRISPR-CAS9.

The CRISPR-CAS9 system may be used to target a TET1 protein of thepresent disclosure to a specific nucleic acid. Targeting usingCRISPR-CAS9 may be beneficial over other genome targeting techniques incertain instances. For example, one need only change the guide RNAs inorder to target fusion proteins to a new genomic location, or evenmultiple locations simultaneously. In addition, guide RNAs can beextended to include sites for binding to proteins, such as the MS2protein, which can be fused to proteins of interest.

CAS9 Proteins

A variety of CAS9 proteins may be used in the methods of the presentdisclosure. There are several CAS9 genes present in different bacteriaspecies (Esvelt, K et al, 2013, Nature Methods). One of the mostcharacterized CAS9 proteins is the CAS9 protein from S. pyogenes that,in order to be active, needs to bind a gRNA with a specific sequence andthe presence of a PAM motif (NGG, where N is any nucleotide) at the 3′end of the target locus. However, other CAS9 proteins from differentbacterial species show differences in 1) the sequence of the gRNA theycan bind and 2) the sequence of the PAM motif. Therefore, it is possiblethat other CAS9 proteins such as, for example, those from Streptococcusthermophilus or N. meningitidis may also be utilized herein. Indeed,these two CAS9 proteins have a smaller size (around 1100 amino acids) ascompared to S. pyogenes CAS9 (1400 amino acids), which may confer someadvantages during cloning or protein expression.

CAS9 proteins from a variety of bacteria have been used successfully inengineered CRISPR-CAS9 systems. There are also versions of CAS9 proteinsavailable in which the codon usage has been more highly optimized forexpression in eukaryotic systems, such as human codon optimized CAS9(Cell, 152:1173-1183) and plant optimized CAS9 (Nature Biotechnology,31:688-691).

CAS9 proteins may also be modified for various purposes. For example,CAS9 proteins may be engineered to contain a nuclear-localizationsequence (NLS). CAS9 proteins may be engineered to contain an NLS at theN-terminus of the protein, at the C-terminus of the protein, or at boththe N- and C-terminus of the protein. Engineering a CAS9 protein tocontain an NLS may assist with directing the protein to the nucleus of ahost cell. CAS9 proteins may be engineered such that they are unable tocleave nucleic acids (e.g. nuclease-deficient dCAS9 polypeptides). Oneof skill in the art would be able to readily identify a suitable CAS9protein for use in the methods and compositions of the presentdisclosure.

Exemplary CAS9 proteins that may be used in the methods and compositionsof the present disclosure may include, for example, a CAS9 proteinhaving the amino acid sequence of any one of SEQ ID NO: 15, SEQ ID NO:16, and/or SEQ ID NO: 17, homologs thereof, and fragments thereof.

In some embodiments, a CAS9 polypeptide or fragment thereof of thepresent disclosure has an amino acid sequence with at least about 20%,at least about 25%, at least about 30%, at least about 40%, at leastabout 50%, at least about 55%, at least about 60%, at least about 65%,at least about 70%, at least about 75%, at least about 80%, at leastabout 85%, at least about 90%, at least about 91%, at least about 92%,at least about 93%, at least about 94%, at least about 95%, at leastabout 96%, at least about 97%, at least about 98%, at least about 99%,or at least about 100% amino acid identity to the amino acid sequence ofSEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, or SEQ ID NO: 125. In someembodiments, the CAS9 polypeptide does not have nuclease activity and isunable to cleave a nucleic acid molecule (e.g. dCAS9 polypeptide).

CRISPR RNAs

The CRISPR RNA (crRNA) of the present disclosure may take a variety offorms. As described above, the sequence of the crRNA is involved inconferring specificity to targeting a specific nucleic acid.

Many different crRNA molecules can be designed to target many differentsequences. With respect to targeting, target nucleic acids generallyrequire the PAM sequence, NGG, at the end of the 20 base pair targetsequence. crRNAs of the present disclosure may be expressed as a singlecrRNA molecule, or they may be expressed in the form of a crRNA/tracrRNAhybrid molecule where the crRNA and the tracrRNA have been fusedtogether, forming a guide RNA (gRNA). crRNA molecules and/or guide RNAmolecules may be extended to include sites for the binding of RNAbinding proteins.

Multiple crRNAs and/or guide RNAs can be encoded into a single CRISPRarray to enable simultaneous targeting to several sites (Science 2013:Vol. pp. 819-823). For example, the tracrRNA may be expressedseparately, and two adjacent target sequences may be encoded in apre-crRNA array interspaced with repeats.

A variety of promoters may be used to drive expression of the crRNAand/or the guide RNA. crRNAs and/or guide RNAs may be expressed using aPol III promoter such as, for example, the U6 promoter or the H1promoter (eLife 2013 2:e00471). For example, an approach in plants hasbeen described using three different Pol III promoters from threedifferent Arabidopsis U6 genes, and their corresponding gene terminators(BMC Plant Biology 2014 14:327). One skilled in the art would readilyunderstand that many additional Pol III promoters could be utilized tosimultaneously express many crRNAs and/or guide RNAs to many differentlocations in the genome simultaneously. The use of different Pol IIIpromoters for each crRNA and/or gRNA expression cassette may bedesirable to reduce the chances of natural gene silencing that can occurwhen multiple copies of identical sequences are expressed in plants. Inaddition, crRNAs and/or guide RNAs can be modified to improve theefficiency of their function in guiding CAS9 to a target nucleic acid.For example, it has been shown that adding either 8 or 20 additionalnucleotides to the gRNA in order to extend the hairpin by 4 or 10 basepairs resulted in more efficient CAS9 activity (eLife 2013 2:e00471).

Alternatively, a tRNA-gRNA expression cassette (Xie, X et al, 2015, ProcNatl Acad Sci USA. 2015 Mar. 17; 112(11):3570-5) may be used to delivermultiple gRNAs simultaneously with high expression levels.

Trans-Activating CRISPR RNAs

The trans-activating CRISPR RNA (tracrRNA) of the present disclosure maytake a variety of forms, as will be readily understood by one of skillin the art. As described above, tracrRNAs are involved in the maturationof a crRNA. tracrRNAs of the present disclosure may be expressed as asingle tracrRNA molecule, or they may be expressed in the form of acrRNA/tracrRNA hybrid molecule where the crRNA and the tracrRNA havebeen fused together, forming a guide RNA (gRNA). tracrRNA moleculesand/or guide RNA molecules may be extended to include sites for thebinding of RNA binding proteins.

As CRISPR systems naturally exist in a variety of bacteria, theframework of the crRNA and tracrRNA in these bacteria may be adapted foruse in the methods and compositions described herein. crRNAs, tracrRNAs,and/or guide RNAs of the present disclosure may be constructed based onthe framework of one or more of these molecules in, for example, S.pyogenes, Streptococcus thermophilus, and/or N. meningitidis. Forexample, a guide RNA of the present disclosure may be constructed basedon the framework of the crRNA and tracrRNA from S. pyogenes (SEQ ID NO:18), Streptococcus thermophilus (SEQ ID NO: 19), and/or N. meningitidis(SEQ ID NO: 20). In these exemplary frameworks, the 5′ end of thesequence contains 20 generic nucleotides (N) that correspond to thecrRNA targeting sequence. This sequence will vary depending on thesequence of the particular nucleic acid being targeted.

Linkers

Various linkers may be used in the construction of recombinant proteinsas described herein. In general, linkers are short peptides thatseparate the different domains in a multi-domain protein. They may playan important role in fusion proteins, affecting the crosstalk betweenthe different domains, the yield of protein production, and thestability and/or the activity of the fusion proteins. Linkers aregenerally classified into 2 major categories: flexible or rigid.Flexible linkers are typically used when the fused domains require acertain degree of movement or interaction, and these linkers are usuallycomposed of small amino acids such as, for example, Glycine (G), serine(S) or proline (P).

The certain degree of movement between domains allowed by flexiblelinkers is an advantage in some fusion proteins. However, it has beenreported that flexible linkers can sometimes reduce protein activity dueto an inefficient separation of the two domains. In this case, rigidlinkers may be used since they enforce a fixed distance between domainsand promote their independent functions. A thorough description ofseveral linkers has been provided in Chen X et al., 2013, Advanced DrugDelivery Reviews 65 (2013) 1357-1369).

Various linkers may be used in, for example, the construction ofrecombinant TET1 polypeptides that are fused to a CAS9 protein asdescribed herein. Linkers may be used in the TET1-CAS9 fusion proteinsdescribed herein to separate the coding sequences of a TET1 polypeptideand a CAS9 protein. For example, a variety of wiggly/flexible linkers,stiff/rigid linkers, short linkers, and long linkers may be used asdescribed herein. Various linkers as described herein may be used in theconstruction of TET1-like proteins as described herein.

A variety of shorter or longer linker regions are known in the art, forexample corresponding to a series of Glycine residues, a series ofadjacent Glycine-serine dipeptides, a series of adjacentGlycine-Glycine-serine tripeptides, or known linkers from otherproteins. A flexible linker may include, for example, the amino acidsequence: SSGPPPGTG (SEQ ID NO: 164) and variants thereof. A rigidlinker may include, for example, the amino acid sequence: AEAAAKEAAAKA(SEQ ID NO: 165) and variants thereof. The XTEN linker, SGSETPGTSESATPES(SEQ ID NO: 166), and variants thereof, described in Guilinget et al,2014 (Nature Biotechnology 32, 577-582), may also be used. Thisparticular linker was previously shown to produce the best results amongother linkers in a protein fusion between dCAS9 and the nuclease FokI.

The linkers having the nucleotide sequences presented in SEQ ID NO: 139and SEQ ID NO: 140 may also be used in the methods and compositions asdescribed herein. The linker having the amino acid sequence presented inSEQ ID NO: 141 may also be used in the methods and compositions asdescribed herein.

Variations of CRISPR-CAS9 Targeting

Certain aspects of the present disclosure relate to recombinantly fusinga TET polypeptide (e.g. TET1 polypeptide) of the present disclosure to aCAS9 protein. However, CRISPR-CAS9 targeting schemes as described hereinto target a specific nucleic acid may also involve schemes where apolypeptide of the present disclosure is targeted to a specific nucleicacid without being recombinantly fused to a CAS9 protein.

The use of recombinant proteins containing a TET polypeptide (e.g. TET1polypeptide) recombinantly fused to an RNA-binding protein may be usedin targeting of the TET polypeptide (e.g. TET1 polypeptide) to aspecific nucleic acid via CRISPR-CAS9 targeting. In some embodiments, aTET polypeptide (e.g. TET1 polypeptide) is recombinantly fused to an MS2coat protein such that these fusion proteins may be directed to a targetnucleic acid with the assistance of a CAS9 protein. In some embodiments,MS2 targeting systems may involve a fusion of a TET polypeptide (e.g.TET1 polypeptide) to a dCAS9 polypeptide. In some embodiments, theTET-dCAS9 fusion (e.g. TET1-dCAS9 fusion) is a direct fusion. In someembodiments, the TET-dCAS9 fusion (e.g. TET1-dCAS9 fusion) is anindirect fusion.

Various MS2 coat proteins may be used, such as SEQ ID NO: 52 andhomologs thereof. This targeting scheme is further described herein andwill be readily understood by one of skill in the art in view of thepresent disclosure.

In addition to fusing a TET polypeptide (e.g. TET1 polypeptide) to anMS2 coat protein, other RNA-binding proteins may also be used in thistargeting scheme. For example, the proteins PP7 and COM (Zalatan et al.,Cell 160, 339-350), may also be recombinantly fused to a TET polypeptide(e.g. TET1 polypeptide) such that these fusion proteins may be directedto a target nucleic acid with the assistance of a CAS9 protein.

The use of recombinant proteins containing a TET polypeptide (e.g. TET1polypeptide) recombinantly fused to an antibody or fragment thereof maybe used in targeting of the TET polypeptide (e.g. TET1 polypeptide) to aspecific nucleic acid via CRISPR-CAS9 targeting. In some embodiments, aTET polypeptide (e.g. TET1 polypeptide) is recombinantly fused to anscFV antibody such that these fusion proteins may be directed to atarget nucleic acid with the assistance of a CAS9 protein. Various scFVantibodies may be used, such as SEQ ID NO: 53 and homologs thereof. Thistargeting scheme is further described herein and will be readilyunderstood by one of skill in the art in view of the present disclosure.

Similar systems using antibody mimetic proteins or proteins which canbind other proteins may also be used in the methods described herein.For example, designed ankyrin repeat proteins (DARPins), which are smalland highly stable proteins that can bind their epitopes with strongaffinity (Binz et al., 2004, Nat. Biotechnol. 22, 575-582), may berecombinantly fused to a TET polypeptide (e.g. TET1 polypeptide) suchthat these fusion proteins may be directed to a target nucleic acid withthe assistance of a CAS9 protein.

SunTag Systems

Certain aspects of the present disclosure relate to the use of SunTagsystems for targeting (using CRISPR-based targeting) a TET polypeptide(e.g. TET1 polypeptide) of the present disclosure to a target nucleicacid. A synthetic system was previously developed for use in mammals forrecruiting multiple copies of a protein to a target polypeptide chain,and this system was called a SunTag system (Tanenbaum et al.,2014)(WO2016011070). This system was also adapted so that the multiplecopies of the protein using the SunTag system could be targeted to anucleic acid using the CRISPR-Cas9 system (Tanenbaum et al., 2014).However, this system was developed for use in mammals. Provided hereinare methods and compositions for SunTag systems adapted to target TETpolypeptides (e.g. TET1 polypeptides) to specific loci in plants.

Accordingly, the present disclosure provides methods and compositionsfor the recruitment of multiple copies of a TET polypeptide (e.g. TET1polypeptide) to a target nucleic acid in plants via CRISPR-basedtargeting in a manner that allows for demethylation and/or activation ofthe target nucleic acid. In certain aspects, this specific targetinginvolves the use of a system that includes (1) a nuclease-deficient CAS9polypeptide that is recombinantly fused to a multimerized epitope, (2) aTET polypeptide (e.g. TET1 polypeptide) that is recombinantly fused toan affinity polypeptide, and (3) a guide RNA (gRNA). In this aspect, thedCAS9 portion of the dCAS9-multimerized epitope fusion protein isinvolved with targeting a target nucleic acid as directed by the guideRNA. The multimerized epitope portion of the dCAS9-multimerized epitopefusion protein is involved with binding to the affinity polypeptide(which is recombinantly fused to a TET polypeptide (e.g. TET1polypeptide)). The affinity polypeptide portion of the TET polypeptide(e.g. TET1 polypeptide)-affinity polypeptide fusion protein is involvedwith binding to the multimerized epitope so that the TET polypeptide(e.g. TET1 polypeptide) can be in association with dCAS9. The TETpolypeptide (e.g. TET1 polypeptide) portion of the TET polypeptide (e.g.TET1 polypeptide)-affinity polypeptide fusion protein is involved withinducing demethylation and/or activation of a target nucleic acid, oncethe complex has been targeted to a target nucleic acid via the guideRNA.

As described above, SunTag systems involve targeting based onCRISPR-CAS9 systems. CRISPR-CAS9 systems are described above. Thefeatures of CRISPR-CAS9 systems may be used in SunTag systems of thepresent disclosure as appropriate, as will be readily understood by oneof skill in the art.

Affinity Polypeptides

Certain aspects of the present disclosure relate to recombinantpolypeptides that contain an affinity polypeptide. Affinity polypeptidesof the present disclosure may bind to one or more epitopes (e.g. amultimerized epitope). In some embodiments, an affinity polypeptide ispresent in a recombinant polypeptide that contains a TET polypeptide(e.g. TET1 polypeptide) and an affinity polypeptide.

A variety of affinity polypeptides are known in the art and may be usedherein. Generally, the affinity polypeptide should be stable in theconditions present in the intracellular environment of a plant cell.Additionally, the affinity polypeptide should specifically bind to itscorresponding epitope with minimal cross-reactivity.

The affinity polypeptide may be an antibody such as, for example, anscFv. The antibody may be optimized for stability in the plantintracellular environment. When a GCN4 epitope is used in the methodsdescribed herein, a suitable affinity polypeptide that is an antibodymay contain an anti-GCN4 scFv domain.

In embodiments where the affinity polypeptide is an scFv antibody, thepolypeptide may contain an amino acid sequence with at least about 20%,at least about 25%, at least about 30%, at least about 40%, at leastabout 50%, at least about 55%, at least about 60%, at least about 65%,at least about 70%, at least about 75%, at least about 80%, at leastabout 85%, at least about 90%, at least about 91%, at least about 92%,at least about 93%, at least about 94%, at least about 95%, at leastabout 96%, at least about 97%, at least about 98%, at least about 99%,or at least about 100% amino acid identity to the amino acid sequence ofSEQ ID NO: 132.

Other exemplary affinity polypeptides include, for example, proteinswith SH2 domains or the domain itself, 14-3-3 proteins, proteins withSH3 domains or the domain itself, the Alpha-Syntrophin PDZ proteininteraction domain, the PDZ signal sequence, or proteins from plantswhich can recognize AGO hook motifs (e.g. AGO4 from Arabidopsisthaliana).

Additional affinity polypeptides that may be used in the methods andcompositions described herein will be readily apparent to those of skillin the art.

Epitopes and Multimerized Epitopes

Certain aspects of the present disclosure relate to recombinantpolypeptides that contain an epitope or a multimerized epitope. Epitopesof the present disclosure may bind to an affinity polypeptide. In someembodiments, an epitope or multimerized epitope is present in arecombinant polypeptide that contains a dCAS9 polypeptide.

Epitopes of the present disclosure may be used for recruiting affinitypolypeptides (and any polypeptides they may be recombinantly fused to)to a dCAS9 polypeptide. In embodiments where a dCAS9 polypeptide isfused to an epitope or a multimerized epitope, the dCAS9 polypeptide maybe fused to one copy of an epitope, multiple copies of an epitope, morethan one different epitope, or multiple copies of more than onedifferent epitope as further described herein.

A variety of epitopes and multimerized epitopes are known in the art andmay be used herein. In general, the epitope or multimerized epitope maybe any polypeptide sequence that is specifically recognized by anaffinity polypeptide of the present disclosure. Exemplary epitopes mayinclude a c-Myc affinity tag, an HA affinity tag, a His affinity tag, anS affinity tag, a methionine-His affinity tag, an RGD-His affinity tag,a FLAG octapeptide, a strep tag or strep tag II, a V5 tag, a VSV-Gepitope, and a GCN4 epitope.

Other exemplary amino acid sequences that may serve as epitopes andmultimerized epitopes include, for example, phosphorylated tyrosines inspecific sequence contexts recognized by SH2 domains, characteristicconsensus sequences containing phosphoserines recognized by 14-3-3proteins, proline rich peptide motifs recognized by SH3 domains, the PDZprotein interaction domain or the PDZ signal sequence, and the AGO hookmotif from plants.

Epitopes described herein may also be multimerized. Multimerizedepitopes may include at least 2, at least 3, at least 4, at least 5, atleast 6, at least 7, at least 8, at least 9, at least 10, at least 11,at least 12, at least 13, at least 14, at least 15, at least 16, atleast 17, at least 18, at least 19, at least 20, at least 21, at least22, at least 23, or at least 24 or more copies of an epitope.

Multimerized epitopes may be present as tandem copies of an epitope, oreach individual epitope may be separated from another epitope in themultimerized epitope by a linker or other amino acid sequence. Suitablelinker regions are known in the art and are described herein. The linkermay be configured to allow the binding of affinity polypeptides toadjacent epitopes without substantial steric hindrance. Linker sequencesmay also be configured to provide an unstructured or linear region ofthe polypeptide to which they are recombinantly fused. The linkersequence may comprise e.g. one or more glycines and/or serines. Thelinker sequences may be e.g. at least 2, at least 3, at least 4, atleast 5, at least 6, at least 7, at least 8, at least 9, or at least 10or more amino acids in length. The linker sequences may be e.g. 5-10,10-15, 15-20, or 20-25 amino acids in length.

In some embodiments, the epitope is a GCN4 epitope (SEQ ID NO: 138). Insome embodiments, the multimerized epitope contains at least 2, at least3, at least 4, at least 5, at least 6, at least 7, at least 8, at least9, at least 10, at least 11, at least 12, at least 13, at least 14, atleast 15, at least 16, at least 17, at least 18, at least 19, at least20, at least 21, at least 22, at least 23, or at least 24 copies of aGCN4 epitope. In some embodiments, the multimerized epitope contains 10copies of a GCN4 epitope.

Additional epitopes and multimerized epitopes that may be used in themethods and compositions described herein will be readily apparent tothose of skill in the art.

Recombinant Polypeptides

Certain aspects of the present disclosure relate to reducing methylationof a target nucleic acid in a plant by expressing recombinant TETpolypeptides in plants. Exemplary TET polypeptides include TET1, TET2,and TET3. Ten-eleven translocation (TET) proteins are known in the art.It has been shown that expressing TET proteins in cell lines leads to areduction in 5mC levels and leads to the formation of 5hmC. Mutations inthe signature His-Xaa-Asp motif (where Xaa represents any amino acid) ofthese dioxygenases abolishes this activity. His-Xaa-Asp is presentedherein as HXD, where X is any amino acid. The TET protein family membersalso share a conserved cysteine-rich region in addition to thedioxygenase motifs (DSBH) role in Fe(II) and oxoglutarate binding. Inthe presence of the necessary cofactors 2-oxoglutarate and Fe²⁺, TETproteins can efficiently convert 5mC to 5hmC in vitro, and furtheroxidize to 5fC and 5caC.

There is conservation of the amoeba NgTet1 with mouse mTet1 and humanhTET1 catalytic domain. The NgTet1 can catalyze the conversion of 5mC to5hmC, and its structure represents the core structure of the catalyticdomains of human TET enzymes. Humans TETs have an atypical non-conservedinsertion between the two halves of the His-Xaa-Asp and C-term Hisresidues called CTD-like. In addition human TETs have a unique Cis-richdomain at the N-term (residues 1525-1572 in hTET1). Removing these twoinsertions shows that NgTet1 and mammalian TETs share 14% identity or39% similarity. However, both can perform the same catalytic activity.Another conservation involves (i) an invariant proline causing a kink ofhelix α4 and (ii) helices α5 and α6 which are composed of a stretch ofresidues predicted to be Tet/JBP specific (See Hashimoto et al, 2014Feb. 20: 506(7488):391-5). An alignment of the TET catalytic domains ispresented in FIG. 66A and FIG. 66B. Other TET protein alignments arepresented in FIG. 67 and FIG. 68 .

TET proteins are generally considered to be methylcytosine dioxygenases.Certain aspects of the present disclosure relate to use of dioxygenasesto reduce methylation of a target nucleic acid. In some embodiments, thecatalytic domain of the dioxygenase is used in the methods describedherein. The dioxygenase may be a TET polypeptide such as e.g. a TET1polypeptide, a TET2 polypeptide, a TET3 polypeptide, or the catalyticdomain of said polypeptides. In some embodiments, the TET polypeptideincludes the amino acid sequence set forth in HXD, where X is any aminoacid or HXR, where X is any amino acid.

Certain aspects of the present disclosure relate to use of dioxygenasesthat use molecular oxygen and the cofactors Fe(II) and 2-oxoglutarate toconvert 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC),5-formylcytosine (5fC), and 5-carboxylcytosine DNA (together referred toas oxidized methylcytosines or oxi-mC) to reduce methylation of a targetnucleic acid.

Certain methods of the present disclosure relate to reducing methylationof a target nucleic acid in a plant by recombinantly fusing a TETpolypeptide (e.g. TET1 polypeptide) to a heterologous DNA-bindingdomain, where the DNA-binding domain is able to bind a specific nucleicacid sequence and thus the TET polypeptide (e.g. TET1 polypeptide) istargeted to the specific nucleic acid sequence. Certain methods of thepresent disclosure relate to reducing methylation of a target nucleicacid in a plant by targeting a TET polypeptide (e.g. TET1 polypeptide)recombinantly fused to a CAS9 protein to the target nucleic acid.Certain methods of the present disclosure relate to reducing methylationof a target nucleic acid in a plant by targeting a TET polypeptide (e.g.TET1 polypeptide) to a target nucleic acid with the assistance of a CAS9protein. As used herein, a “polypeptide” is an amino acid sequenceincluding a plurality of consecutive polymerized amino acid residues(e.g., at least about 15 consecutive polymerized amino acid residues).“Polypeptide” refers to an amino acid sequence, oligopeptide, peptide,protein, or portions thereof, and the terms “polypeptide” and “protein”are used interchangeably.

Polypeptides as described herein also include polypeptides havingvarious amino acid additions, deletions, or substitutions relative tothe native amino acid sequence of a polypeptide of the presentdisclosure. In some embodiments, polypeptides that are homologs of apolypeptide of the present disclosure contain non-conservative changesof certain amino acids relative to the native sequence of a polypeptideof the present disclosure. In some embodiments, polypeptides that arehomologs of a polypeptide of the present disclosure contain conservativechanges of certain amino acids relative to the native sequence of apolypeptide of the present disclosure, and thus may be referred to asconservatively modified variants. A conservatively modified variant mayinclude individual substitutions, deletions or additions to apolypeptide sequence which result in the substitution of an amino acidwith a chemically similar amino acid. Conservative substitution tablesproviding functionally similar amino acids are well-known in the art.Such conservatively modified variants are in addition to and do notexclude polymorphic variants, interspecies homologs, and alleles of thedisclosure. The following eight groups contain amino acids that areconservative substitutions for one another: 1) Alanine (A), Glycine (G);2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine(Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L),Methionine (M), Valine (V); 6) Phenylalanine (F), Tyrosine (Y),Tryptophan (W); 7) Serine (S), Threonine (T); and 8) Cysteine (C),Methionine (M) (see, e.g., Creighton, Proteins (1984)). A modificationof an amino acid to produce a chemically similar amino acid may bereferred to as an analogous amino acid.

Recombinant polypeptides of the present disclosure that are composed ofindividual polypeptide domains may be described based on the individualpolypeptide domains of the overall recombinant polypeptide. A domain insuch a recombinant polypeptide refers to the particular stretches ofcontiguous amino acid sequences with a particular function or activity.For example, in a recombinant polypeptide that is a fusion of a TETpolypeptide (e.g. TET1 polypeptide) and a DNA-binding domain, thecontiguous amino acids that encode the TET polypeptide (e.g. TET1polypeptide) may be described as the TET domain (e.g. TET1 domain) inthe overall recombinant polypeptide, and the contiguous amino acids thatencode the DNA-binding domain may be described as the DNA-binding domainin the overall recombinant polypeptide. Individual domains in an overallrecombinant protein may also be referred to as units of the recombinantprotein. Recombinant polypeptides that are composed of individualpolypeptide domains may also be referred to as fusion polypeptides.

Fusion polypeptides of the present disclosure may contain an individualpolypeptide domain that is in various N-terminal or C-terminalorientations relative to other individual polypeptide domains present inthe fusion polypeptide. Fusion of individual polypeptide domains infusion polypeptides may also be direct or indirect fusions. Directfusions of individual polypeptide domains refer to direct fusion of thecoding sequences of each respective individual polypeptide domain. Inembodiments where the fusion is indirect, a linker domain or othercontiguous amino acid sequence may separate the coding sequences of twoindividual polypeptide domains in a fusion polypeptide.

Nuclear Localization Signals (NLS)

Recombinant polypeptides of the present disclosure may contain one ormore nuclear localization signals (NLS). Nuclear localization signalsmay also be referred to as nuclear localization sequences, domains,peptides, or other terms readily apparent to those of skill in the art.Nuclear localization signals are a translocation sequence that, whenpresent in a polypeptide, direct that polypeptide to localize to thenucleus of a eukaryotic cell.

Various nuclear localization signals may be used in recombinantpolypeptides of the present disclosure. For example, one or moreSV40-type NLS or one or more REX NLS may be used in recombinantpolypeptides. Recombinant polypeptides may also contain two or moretandem copies of a nuclear localization signal. For example, recombinantpolypeptides may contain at least two, at least three, at least for, atleast five, at least six, at least seven, at least eight, at least nine,or at least ten copies, either tandem or not, of a nuclear localizationsignal.

Recombinant polypeptides of the present disclosure may contain one ormore nuclear localization signals that contain an amino acid sequencewith at least about 20%, at least about 25%, at least about 30%, atleast about 40%, at least about 50%, at least about 55%, at least about60%, at least about 65%, at least about 70%, at least about 75%, atleast about 80%, at least about 85%, at least about 90%, at least about91%, at least about 92%, at least about 93%, at least about 94%, atleast about 95%, at least about 96%, at least about 97%, at least about98%, at least about 99%, or at least about 100% amino acid identity tothe amino acid sequence of any one of SEQ ID NO: 36, SEQ ID NO: 43, SEQID NO: 60, SEQ ID NO: 72, SEQ ID NO: 112, SEQ ID NO: 113, and/or SEQ IDNO: 127.

TET1 Proteins

Certain aspects of the present disclosure relate to TET1-like proteins.In some embodiments, a TET1-like protein refers to a recombinant TET1protein or fragment thereof that contains a heterologous DNA-bindingdomain. In some embodiments, a TET1-like protein refers to a recombinantTET1 protein or fragment thereof that is fused to a CAS9 protein orfragment thereof. In some embodiments, a TET1-like protein refers to arecombinant TET1 protein or fragment thereof that is fused to an MS2coat protein or fragment thereof. In some embodiments, a TET1-likeprotein refers to a recombinant TET1 protein or fragment thereof that isfused to an scFV antibody or fragment thereof. TET1-like proteins may beused in reducing methylation of one or more target nucleic acids, suchas genes, in plants.

TET1 is an enzyme that catalyzes the conversion of 5-methylcytosine(5mC) to 5-hydroxymethylcytosine (5hmC) (Tahiliani, M. et al. Science324, 930-935 (2009)). While the role of 5hmC is not entirely clear, ithas been proposed that it may be an intermediate in the process ofdemethylation of 5-methylcytosine to cytosine. This is supported byevidence that overexpression of TET1 in cultured cells leads to anoverall decrease in levels of 5mC (Tahiliani, M. et al. Science 324,930-935 (2009)). Several mechanisms of demethylation have been proposed.There is evidence that 5hmC can be deaminated and that the resultingmismatched base is recognized by DNA glycosylases and subsequentlyrepaired to cytosine via the base excision repair pathway (Guo et el.,Cell 145, 423-434 (2011)). Alternatively, there is also evidence thatiterative oxidation of 5hmC by TET1 yields 5-formylcytosine (fC) and5-carboxylcytosine (caC), which can then be recognized by thymine DNAglycosylase and reverted to cytosine through base excision repair (He etal., Science 333, 1303-1307 (2011)). In either case, the evidencehighlights TET1 as a primary catalyst for DNA demethylation.

In some embodiments, a TET1-like protein of the present disclosureincludes a functional fragment of a full-length TET1 protein where thefragment maintains the ability to catalyze demethylation of a nucleicacid. In some embodiments, a TET1 protein fragment contains at least 20consecutive amino acids, at least 30 consecutive amino acids, at least40 consecutive amino acids, at least 50 consecutive amino acids, atleast 60 consecutive amino acids, at least 70 consecutive amino acids,at least 80 consecutive amino acids, at least 90 consecutive aminoacids, at least 100 consecutive amino acids, at least 120 consecutiveamino acids, at least 140 consecutive amino acids, at least 160consecutive amino acids, at least 180 consecutive amino acids, at least200 consecutive amino acids, at least 220 consecutive amino acids, atleast 240 consecutive amino acids, or 241 or more consecutive aminoacids of a full-length TET1 protein. In some embodiments, TET1 proteinfragments may include sequences with one or more amino acids removedfrom the consecutive amino acid sequence of a full-length TET1 protein.In some embodiments, TET1 protein fragments may include sequences withone or more amino acids replaced/substituted with an amino aciddifferent from the endogenous amino acid present at a given amino acidposition in a consecutive amino acid sequence of a full-length TET1protein. In some embodiments, TET1 protein fragments may includesequences with one or more amino acids added to an otherwise consecutiveamino acid sequence of a full-length TET1 protein.

Suitable TET1 proteins may be identified and isolated from variousmammalian organisms. Examples of such organisms may include, forexample, Homo sapiens, Pan paniscus, Gorilla gorilla, Mandrillusleucophaeus, Equus caballus, Canis lupus familiaris, and Ovis aries.Examples of suitable TET1 proteins may include, for example, thoselisted in Table 1, homologs thereof, and orthologs thereof.

TABLE 1 TET1 Proteins Organism Gene Name SED ID NO. Homo sapiensNP_085128 1 Pan paniscus XP_003 846089.1 2 Gorilla gorillaXP_004049552.1 3 Mandrillus leucophaeus XP_011849484 4 Equus caballusXP_005602635 5 Canis lupus familiaris XP_536371 6 Ovis ariesXP_011960588 7

In some embodiments, a TET1 protein or fragment thereof of the presentdisclosure has an amino acid sequence with at least about 20%, at leastabout 25%, at least about 30%, at least about 40%, at least about 50%,at least about 55%, at least about 60%, at least about 65%, at leastabout 70%, at least about 75%, at least about 80%, at least about 85%,at least about 90%, at least about 91%, at least about 92%, at leastabout 93%, at least about 94%, at least about 95%, at least about 96%,at least about 97%, at least about 98%, at least about 99%, or at leastabout 100% amino acid identity to the amino acid sequence of the Homosapiens TET1 protein (SEQ ID NO: 1).

A TET1-like protein may include the amino acid sequence or a fragmentthereof of any TET1 homolog or ortholog, such as any one of those listedin Table 1. One of skill would readily recognize that additional TET1homologs and/or orthologs may exist and may be used herein.

In certain aspects, the catalytic domain of a TET1 protein may be usedin the methods and compositions described herein. The catalytic domainof TET1 is responsible for facilitating demethylation of a nucleic acid.Examples of suitable TET1 catalytic domains may include, for example,those listed in Table 2, homologs thereof, and orthologs thereof.

TABLE 2 TET1 Protein Catalytic Domains Organism Gene Name SED ID NO.Homo sapiens NP_085128 8 Pan paniscus XP_003 846089.1 9 Gorilla gorillaXP_004049552.1 10 Mandrillus leucophaeus XP_011849484 11 Equus caballusXP_005602635 12 Canis lupus familiaris XP_536371 13 Ovis ariesXP_011960588 14

In some embodiments, a TET1 protein catalytic domain of the presentdisclosure has an amino acid sequence with at least about 20%, at leastabout 25%, at least about 30%, at least about 40%, at least about 50%,at least about 55%, at least about 60%, at least about 65%, at leastabout 70%, at least about 75%, at least about 80%, at least about 85%,at least about 90%, at least about 91%, at least about 92%, at leastabout 93%, at least about 94%, at least about 95%, at least about 96%,at least about 97%, at least about 98%, at least about 99%, or at leastabout 100% amino acid identity to the amino acid sequence of the Homosapiens TET1 protein catalytic domain (SEQ ID NO: 8).

A TET1-like protein may include the amino acid sequence or a fragmentthereof of the catalytic domain of any TET1 homolog or ortholog, such asany one of those listed in Table 2. One of skill would readily recognizethat catalytic domains from additional TET1 homologs and/or orthologsmay exist and may be used herein.

TET2 Proteins

Certain aspects of the present disclosure relate to TET2-like proteins.In some embodiments, a TET2-like protein refers to a recombinant TET2protein or fragment thereof that contains a heterologous DNA-bindingdomain. In some embodiments, a TET2-like protein refers to a recombinantTET2 protein or fragment thereof that is fused to a CAS9 protein orfragment thereof. In some embodiments, a TET2-like protein refers to arecombinant TET2 protein or fragment thereof that is fused to an MS2coat protein or fragment thereof. In some embodiments, a TET2-likeprotein refers to a recombinant TET2 protein or fragment thereof that isfused to an scFV antibody or fragment thereof. TET2-like proteins may beused in reducing methylation of one or more target nucleic acids, suchas genes, in plants.

In some embodiments, a TET2-like protein of the present disclosureincludes a functional fragment of a full-length TET2 protein where thefragment maintains the ability to catalyze demethylation of a nucleicacid. In some embodiments, a TET2 protein fragment contains at least 20consecutive amino acids, at least 30 consecutive amino acids, at least40 consecutive amino acids, at least 50 consecutive amino acids, atleast 60 consecutive amino acids, at least 70 consecutive amino acids,at least 80 consecutive amino acids, at least 90 consecutive aminoacids, at least 100 consecutive amino acids, at least 120 consecutiveamino acids, at least 140 consecutive amino acids, at least 160consecutive amino acids, at least 180 consecutive amino acids, at least200 consecutive amino acids, at least 220 consecutive amino acids, atleast 240 consecutive amino acids, or 241 or more consecutive aminoacids of a full-length TET2 protein. In some embodiments, TET2 proteinfragments may include sequences with one or more amino acids removedfrom the consecutive amino acid sequence of a full-length TET2 protein.In some embodiments, TET2 protein fragments may include sequences withone or more amino acids replaced/substituted with an amino aciddifferent from the endogenous amino acid present at a given amino acidposition in a consecutive amino acid sequence of a full-length TET2protein. In some embodiments, TET2 protein fragments may includesequences with one or more amino acids added to an otherwise consecutiveamino acid sequence of a full-length TET2 protein.

Suitable TET2 proteins may be identified and isolated from variousmammalian organisms. The amino acid sequence of human TET2 protein isset forth in SEQ ID NO: 191.

In some embodiments, a TET2 protein or fragment thereof of the presentdisclosure has an amino acid sequence with at least about 20%, at leastabout 25%, at least about 30%, at least about 40%, at least about 50%,at least about 55%, at least about 60%, at least about 65%, at leastabout 70%, at least about 75%, at least about 80%, at least about 85%,at least about 90%, at least about 91%, at least about 92%, at leastabout 93%, at least about 94%, at least about 95%, at least about 96%,at least about 97%, at least about 98%, at least about 99%, or at leastabout 100% amino acid identity to the amino acid sequence of the Homosapiens TET2 protein (SEQ ID NO: 191).

In certain aspects, the catalytic domain of a TET2 protein may be usedin the methods and compositions described herein. The catalytic domainof TET2 is responsible for facilitating demethylation of a nucleic acid.The amino acid sequence of the catalytic domain of human TET2 protein isset forth in SEQ ID NO: 192.

In some embodiments, a TET2 protein catalytic domain of the presentdisclosure has an amino acid sequence with at least about 20%, at leastabout 25%, at least about 30%, at least about 40%, at least about 50%,at least about 55%, at least about 60%, at least about 65%, at leastabout 70%, at least about 75%, at least about 80%, at least about 85%,at least about 90%, at least about 91%, at least about 92%, at leastabout 93%, at least about 94%, at least about 95%, at least about 96%,at least about 97%, at least about 98%, at least about 99%, or at leastabout 100% amino acid identity to the amino acid sequence of the Homosapiens TET2 protein catalytic domain (SEQ ID NO: 192).

A TET2-like protein may include the amino acid sequence or a fragmentthereof of the catalytic domain of any TET2 homolog or ortholog. One ofskill would readily recognize that catalytic domains from additionalTET2 homologs and/or orthologs may exist and may be used herein.

TET3 Proteins

Certain aspects of the present disclosure relate to TET3-like proteins.In some embodiments, a TET3-like protein refers to a recombinant TET3protein or fragment thereof that contains a heterologous DNA-bindingdomain. In some embodiments, a TET3-like protein refers to a recombinantTET3 protein or fragment thereof that is fused to a CAS9 protein orfragment thereof. In some embodiments, a TET3-like protein refers to arecombinant TET3 protein or fragment thereof that is fused to an MS2coat protein or fragment thereof. In some embodiments, a TET3-likeprotein refers to a recombinant TET3 protein or fragment thereof that isfused to an scFV antibody or fragment thereof. TET3-like proteins may beused in reducing methylation of one or more target nucleic acids, suchas genes, in plants.

In some embodiments, a TET3-like protein of the present disclosureincludes a functional fragment of a full-length TET3 protein where thefragment maintains the ability to catalyze demethylation of a nucleicacid. In some embodiments, a TET3 protein fragment contains at least 20consecutive amino acids, at least 30 consecutive amino acids, at least40 consecutive amino acids, at least 50 consecutive amino acids, atleast 60 consecutive amino acids, at least 70 consecutive amino acids,at least 80 consecutive amino acids, at least 90 consecutive aminoacids, at least 100 consecutive amino acids, at least 120 consecutiveamino acids, at least 140 consecutive amino acids, at least 160consecutive amino acids, at least 180 consecutive amino acids, at least200 consecutive amino acids, at least 220 consecutive amino acids, atleast 240 consecutive amino acids, or 241 or more consecutive aminoacids of a full-length TET3 protein. In some embodiments, TET3 proteinfragments may include sequences with one or more amino acids removedfrom the consecutive amino acid sequence of a full-length TET3 protein.In some embodiments, TET3 protein fragments may include sequences withone or more amino acids replaced/substituted with an amino aciddifferent from the endogenous amino acid present at a given amino acidposition in a consecutive amino acid sequence of a full-length TET3protein. In some embodiments, TET3 protein fragments may includesequences with one or more amino acids added to an otherwise consecutiveamino acid sequence of a full-length TET3 protein.

Suitable TET3 proteins may be identified and isolated from variousmammalian organisms. The amino acid sequence of human TET3 protein isset forth in SEQ ID NO: 193.

In some embodiments, a TET3 protein or fragment thereof of the presentdisclosure has an amino acid sequence with at least about 20%, at leastabout 25%, at least about 30%, at least about 40%, at least about 50%,at least about 55%, at least about 60%, at least about 65%, at leastabout 70%, at least about 75%, at least about 80%, at least about 85%,at least about 90%, at least about 91%, at least about 92%, at leastabout 93%, at least about 94%, at least about 95%, at least about 96%,at least about 97%, at least about 98%, at least about 99%, or at leastabout 100% amino acid identity to the amino acid sequence of the Homosapiens TET3 protein (SEQ ID NO: 193).

In certain aspects, the catalytic domain of a TET3 protein may be usedin the methods and compositions described herein. The catalytic domainof TET3 is responsible for facilitating demethylation of a nucleic acid.The amino acid sequence of the catalytic domain of human TET3 protein isset forth in SEQ ID NO: 194.

In some embodiments, a TET3 protein catalytic domain of the presentdisclosure has an amino acid sequence with at least about 20%, at leastabout 25%, at least about 30%, at least about 40%, at least about 50%,at least about 55%, at least about 60%, at least about 65%, at leastabout 70%, at least about 75%, at least about 80%, at least about 85%,at least about 90%, at least about 91%, at least about 92%, at leastabout 93%, at least about 94%, at least about 95%, at least about 96%,at least about 97%, at least about 98%, at least about 99%, or at leastabout 100% amino acid identity to the amino acid sequence of the Homosapiens TET3 protein catalytic domain (SEQ ID NO: 194).

A TET3-like protein may include the amino acid sequence or a fragmentthereof of the catalytic domain of any TET3 homolog or ortholog. One ofskill would readily recognize that catalytic domains from additionalTET3 homologs and/or orthologs may exist and may be used herein.

Recombinant Nucleic Acids Encoding Recombinant Proteins

Certain aspects of the present disclosure relate to recombinant nucleicacids encoding recombinant proteins of the present disclosure (e.g.TET-like proteins, such as TET1-like proteins). In some embodiments, aTET-like protein (e.g. TET1-like protein) is a recombinant TET protein(e.g. TET1 protein) or fragment thereof that contains a heterologousDNA-binding domain. In some embodiments, a TET-like protein (e.g.TET1-like protein) is a recombinant TET protein (e.g. TET1 protein) orfragment thereof that is fused to a CAS9 protein or fragment thereof. Insome embodiments, a TET-like protein (e.g. TET1-like protein) is arecombinant TET protein (e.g. TET1 protein) or fragment thereof that isfused to an MS2 coat protein or fragment thereof. In some embodiments, aTET-like protein (e.g. TET1-like protein) is a recombinant TET protein(e.g. TET1 protein) or fragment thereof that is fused to an scFVantibody or fragment thereof.

As used herein, the terms “polynucleotide,” “nucleic acid,” andvariations thereof shall be generic to polydeoxyribonucleotides(containing 2-deoxy-D-ribose), to polyribonucleotides (containingD-ribose), to any other type of polynucleotide that is an N-glycoside ofa purine or pyrimidine base, and to other polymers containingnon-nucleotidic backbones, provided that the polymers containnucleobases in a configuration that allows for base pairing and basestacking, as found in DNA and RNA. Thus, these terms include known typesof nucleic acid sequence modifications, for example, substitution of oneor more of the naturally occurring nucleotides with an analog, andinter-nucleotide modifications. As used herein, the symbols fornucleotides and polynucleotides are those recommended by the IUPAC-IUBCommission of Biochemical Nomenclature.

In one aspect, the present disclosure provides a recombinant nucleicacid encoding a TET1-like protein. In some embodiments, the recombinantnucleic acid encodes a TET1 polypeptide or fragment thereof that has anamino acid sequence that is at least 50%, at least 55%, at least 60%, atleast 65%, at least 70%, at least 75%, at least 80%, at least 85%, atleast 90%, at least 91%, at least 92%, at least 93%, at least 94%, atleast 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% identical to SEQ ID NO: 1.

In one aspect, the present disclosure provides a recombinant nucleicacid encoding a TET1-like protein. In some embodiments, the recombinantnucleic acid encodes a catalytic domain of a TET1 protein that has anamino acid sequence that is at least 50%, at least 55%, at least 60%, atleast 65%, at least 70%, at least 75%, at least 80%, at least 85%, atleast 90%, at least 91%, at least 92%, at least 93%, at least 94%, atleast 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% identical to SEQ ID NO: 8.

In one aspect, the present disclosure provides a recombinant nucleicacid encoding a TET2-like protein. In some embodiments, the recombinantnucleic acid encodes a catalytic domain of a TET2 protein that has anamino acid sequence that is at least 50%, at least 55%, at least 60%, atleast 65%, at least 70%, at least 75%, at least 80%, at least 85%, atleast 90%, at least 91%, at least 92%, at least 93%, at least 94%, atleast 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% identical to SEQ ID NO: 192.

In one aspect, the present disclosure provides a recombinant nucleicacid encoding a TET3-like protein. In some embodiments, the recombinantnucleic acid encodes a catalytic domain of a TET3 protein that has anamino acid sequence that is at least 50%, at least 55%, at least 60%, atleast 65%, at least 70%, at least 75%, at least 80%, at least 85%, atleast 90%, at least 91%, at least 92%, at least 93%, at least 94%, atleast 95%, at least 96%, at least 97%, at least 98%, at least 99%, or100% identical to SEQ ID NO: 194.

Sequences of the polynucleotides of the present disclosure may beprepared by various suitable methods known in the art, including, forexample, direct chemical synthesis or cloning. For direct chemicalsynthesis, formation of a polymer of nucleic acids typically involvessequential addition of 3′-blocked and 5′-blocked nucleotide monomers tothe terminal 5′-hydroxyl group of a growing nucleotide chain, whereineach addition is effected by nucleophilic attack of the terminal5′-hydroxyl group of the growing chain on the 3′-position of the addedmonomer, which is typically a phosphorus derivative, such as aphosphotriester, phosphoramidite, or the like. Such methodology is knownto those of ordinary skill in the art and is described in the pertinenttexts and literature (e.g., in Matteucci et al., (1980) Tetrahedron Lett21:719-722; U.S. Pat. Nos. 4,500,707; 5,436,327; and 5,700,637). Inaddition, the desired sequences may be isolated from natural sources bysplitting DNA using appropriate restriction enzymes, separating thefragments using gel electrophoresis, and thereafter, recovering thedesired polynucleotide sequence from the gel via techniques known tothose of ordinary skill in the art, such as utilization of polymerasechain reactions (PCR; e.g., U.S. Pat. No. 4,683,195).

The nucleic acids employed in the methods and compositions describedherein may be codon optimized relative to a parental template forexpression in a particular host cell. Cells differ in their usage ofparticular codons, and codon bias corresponds to relative abundance ofparticular tRNAs in a given cell type. By altering codons in a sequenceso that they are tailored to match with the relative abundance ofcorresponding tRNAs, it is possible to increase expression of a product(e.g. a polypeptide) from a nucleic acid. Similarly, it is possible todecrease expression by deliberately choosing codons corresponding torare tRNAs. Thus, codon optimization/deoptimization can provide controlover nucleic acid expression in a particular cell type (e.g. bacterialcell, plant cell, mammalian cell, etc.). Methods of codon optimizing anucleic acid for tailored expression in a particular cell type arewell-known to those of skill in the art.

Methods of Identifying Sequence Similarity

Various methods are known to those of skill in the art for identifyingsimilar (e.g. homologs, orthologs, paralogs, etc.) polypeptide and/orpolynucleotide sequences, including phylogenetic methods, sequencesimilarity analysis, and hybridization methods.

Phylogenetic trees may be created for a gene family by using a programsuch as CLUSTAL (Thompson et al. Nucleic Acids Res. 22: 4673-4680(1994); Higgins et al. Methods Enzymol 266: 383-402 (1996)) or MEGA(Tamura et al. Mol. Biol. & Evo. 24:1596-1599 (2007)). Once an initialtree for genes from one species is created, potential orthologoussequences can be placed in the phylogenetic tree and their relationshipsto genes from the species of interest can be determined. Evolutionaryrelationships may also be inferred using the Neighbor-Joining method(Saitou and Nei, Mol. Biol. & Evo. 4:406-425 (1987)). Homologoussequences may also be identified by a reciprocal BLAST strategy.Evolutionary distances may be computed using the Poisson correctionmethod (Zuckerkandl and Pauling, pp. 97-166 in Evolving Genes andProteins, edited by V. Bryson and H. J. Vogel. Academic Press, New York(1965)).

In addition, evolutionary information may be used to predict genefunction. Functional predictions of genes can be greatly improved byfocusing on how genes became similar in sequence (i.e. by evolutionaryprocesses) rather than on the sequence similarity itself (Eisen, GenomeRes. 8: 163-167 (1998)). Many specific examples exist in which genefunction has been shown to correlate well with gene phylogeny (Eisen,Genome Res. 8: 163-167 (1998)). By using a phylogenetic analysis, oneskilled in the art would recognize that the ability to deduce similarfunctions conferred by closely-related polypeptides is predictable.

When a group of related sequences are analyzed using a phylogeneticprogram such as CLUSTAL, closely related sequences typically clustertogether or in the same clade (a group of similar genes). Groups ofsimilar genes can also be identified with pair-wise BLAST analysis (Fengand Doolittle, J. Mol. Evol. 25: 351-360 (1987)). Analysis of groups ofsimilar genes with similar function that fall within one clade can yieldsub-sequences that are particular to the clade. These sub-sequences,known as consensus sequences, can not only be used to define thesequences within each clade, but define the functions of these genes;genes within a clade may contain paralogous sequences, or orthologoussequences that share the same function (see also, for example, Mount,Bioinformatics: Sequence and Genome Analysis Cold Spring HarborLaboratory Press, Cold Spring Harbor, N.Y., page 543 (2001)).

To find sequences that are homologous to a reference sequence, BLASTnucleotide searches can be performed with the BLASTN program, score=100,wordlength=12, to obtain nucleotide sequences homologous to a nucleotidesequence encoding a protein of the disclosure. BLAST protein searchescan be performed with the BLASTX program, score=50, wordlength=3, toobtain amino acid sequences homologous to a protein or polypeptide ofthe disclosure. To obtain gapped alignments for comparison purposes,Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul etal. (1997) Nucleic Acids Res. 25:3389. Alternatively, PSI-BLAST (inBLAST 2.0) can be used to perform an iterated search that detectsdistant relationships between molecules. See Altschul et al. (1997)supra. When utilizing BLAST, Gapped BLAST, or PSI-BLAST, the defaultparameters of the respective programs (e.g., BLASTN for nucleotidesequences, BLASTX for proteins) can be used.

Methods for the alignment of sequences and for the analysis ofsimilarity and identity of polypeptide and polynucleotide sequences arewell-known in the art.

As used herein “sequence identity” refers to the percentage of residuesthat are identical in the same positions in the sequences beinganalyzed. As used herein “sequence similarity” refers to the percentageof residues that have similar biophysical/biochemical characteristics inthe same positions (e.g. charge, size, hydrophobicity) in the sequencesbeing analyzed.

Methods of alignment of sequences for comparison are well-known in theart, including manual alignment and computer assisted sequence alignmentand analysis. This latter approach is a preferred approach in thepresent disclosure, due to the increased throughput afforded by computerassisted methods. As noted below, a variety of computer programs forperforming sequence alignment are available, or can be produced by oneof skill.

The determination of percent sequence identity and/or similarity betweenany two sequences can be accomplished using a mathematical algorithm.Examples of such mathematical algorithms are the algorithm of Myers andMiller, CABIOS 4:11-17 (1988); the local homology algorithm of Smith etal., Adv. Appl. Math. 2:482 (1981); the homology alignment algorithm ofNeedleman and Wunsch, J. Mol. Biol. 48:443-453 (1970); thesearch-for-similarity-method of Pearson and Lipman, Proc. Natl. Acad.Sci. 85:2444-2448 (1988); the algorithm of Karlin and Altschul, Proc.Natl. Acad. Sci. USA 87:2264-2268 (1990), modified as in Karlin andAltschul, Proc. Natl. Acad. Sci. USA 90:5873-5877 (1993).

Computer implementations of these mathematical algorithms can beutilized for comparison of sequences to determine sequence identityand/or similarity. Such implementations include, for example: CLUSTAL inthe PC/Gene program (available from Intelligenetics, Mountain View,Calif.); the AlignX program, version10.3.0 (Invitrogen, Carlsbad,Calif.) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the WisconsinGenetics Software Package, Version 8 (available from Genetics ComputerGroup (GCG), 575 Science Drive, Madison, Wis., USA). Alignments usingthese programs can be performed using the default parameters. TheCLUSTAL program is well described by Higgins et al. Gene 73:237-244(1988); Higgins et al. CABIOS 5:151-153 (1989); Corpet et al., NucleicAcids Res. 16:10881-90 (1988); Huang et al. CABIOS 8:155-65 (1992); andPearson et al., Meth. Mol. Biol. 24:307-331 (1994). The BLAST programsof Altschul et al. J. Mol. Biol. 215:403-410 (1990) are based on thealgorithm of Karlin and Altschul (1990) supra.

Polynucleotides homologous to a reference sequence can be identified byhybridization to each other under stringent or under highly stringentconditions. Single stranded polynucleotides hybridize when theyassociate based on a variety of well characterized physical-chemicalforces, such as hydrogen bonding, solvent exclusion, base stacking andthe like. The stringency of a hybridization reflects the degree ofsequence identity of the nucleic acids involved, such that the higherthe stringency, the more similar are the two polynucleotide strands.Stringency is influenced by a variety of factors, including temperature,salt concentration and composition, organic and non-organic additives,solvents, etc. present in both the hybridization and wash solutions andincubations (and number thereof), as described in more detail inreferences cited below (e.g., Sambrook et al., Molecular Cloning: ALaboratory Manual, 2nd Ed., Vol. 1-3, Cold Spring Harbor Laboratory,Cold Spring Harbor, N.Y. (“Sambrook”) (1989); Berger and Kimmel, Guideto Molecular Cloning Techniques, Methods in Enzymology, vol. 152Academic Press, Inc., San Diego, Calif. (“Berger and Kimmel”) (1987);and Anderson and Young, “Quantitative Filter Hybridisation.” In: Hamesand Higgins, ed., Nucleic Acid Hybridisation, A Practical Approach.Oxford, TRL Press, 73-111 (1985)).

Encompassed by the disclosure are polynucleotide sequences that arecapable of hybridizing to the disclosed polynucleotide sequences andfragments thereof under various conditions of stringency (see, forexample, Wahl and Berger, Methods Enzymol. 152: 399-407 (1987); andKimmel, Methods Enzymo. 152: 507-511, (1987)). Full length cDNA,homologs, orthologs, and paralogs of polynucleotides of the presentdisclosure may be identified and isolated using well-knownpolynucleotide hybridization methods.

With regard to hybridization, conditions that are highly stringent, andmeans for achieving them, are well known in the art. See, for example,Sambrook et al. (1989) (supra); Berger and Kimmel (1987) pp. 467-469(supra); and Anderson and Young (1985)(supra).

Hybridization experiments are generally conducted in a buffer of pHbetween 6.8 to 7.4, although the rate of hybridization is nearlyindependent of pH at ionic strengths likely to be used in thehybridization buffer (Anderson and Young (1985)(supra)). In addition,one or more of the following may be used to reduce non-specifichybridization: sonicated salmon sperm DNA or another non-complementaryDNA, bovine serum albumin, sodium pyrophosphate, sodium dodecylsulfate(SDS), polyvinyl-pyrrolidone, ficoll and Denhardt's solution. Dextransulfate and polyethylene glycol 6000 act to exclude DNA from solution,thus raising the effective probe DNA concentration and the hybridizationsignal within a given unit of time. In some instances, conditions ofeven greater stringency may be desirable or required to reducenon-specific and/or background hybridization. These conditions may becreated with the use of higher temperature, lower ionic strength andhigher concentration of a denaturing agent such as formamide.

Stringency conditions can be adjusted to screen for moderately similarfragments such as homologous sequences from distantly related organisms,or to highly similar fragments such as genes that duplicate functionalenzymes from closely related organisms. The stringency can be adjustedeither during the hybridization step or in the post-hybridizationwashes. Salt concentration, formamide concentration, hybridizationtemperature and probe lengths are variables that can be used to alterstringency. As a general guideline, high stringency is typicallyperformed at T_(m)−5° C. to T_(m)−20° C., moderate stringency atT_(m)−20° C. to T_(m)−35° C. and low stringency at T_(m)−35° C. toT_(m)−50° C. for duplex >150 base pairs. Hybridization may be performedat low to moderate stringency (25-50° C. below T_(m)), followed bypost-hybridization washes at increasing stringencies. Maximum rates ofhybridization in solution are determined empirically to occur atT_(m)−25° C. for DNA-DNA duplex and T_(m)−15° C. for RNA-DNA duplex.Optionally, the degree of dissociation may be assessed after each washstep to determine the need for subsequent, higher stringency wash steps.

High stringency conditions may be used to select for nucleic acidsequences with high degrees of identity to the disclosed sequences. Anexample of stringent hybridization conditions obtained in a filter-basedmethod such as a Southern or northern blot for hybridization ofcomplementary nucleic acids that have more than 100 complementaryresidues is about 5° C. to 20° C. lower than the thermal melting point(T_(m)) for the specific sequence at a defined ionic strength and pH.

Hybridization and wash conditions that may be used to bind and removepolynucleotides with less than the desired homology to the nucleic acidsequences or their complements of the present disclosure include, forexample: 6×SSC and 1% SDS at 65° C.; 50% formamide, 4×SSC at 42° C.;0.5×SSC to 2.0×SSC, 0.1% SDS at 50° C. to 65° C.; or 0.1×SSC to 2×SSC,0.1% SDS at 50° C.-65° C.; with a first wash step of, for example, 10minutes at about 42° C. with about 20% (v/v) formamide in 0.1×SSC, andwith, for example, a subsequent wash step with 0.2×SSC and 0.1% SDS at65° C. for 10, 20 or 30 minutes.

For identification of less closely related homologs, wash steps may beperformed at a lower temperature, e.g., 50° C. An example of a lowstringency wash step employs a solution and conditions of at least 25°C. in 30 mM NaCl, 3 mM trisodium citrate, and 0.1% SDS over 30 min.Greater stringency may be obtained at 42° C. in 15 mM NaCl, with 1.5 mMtrisodium citrate, and 0.1% SDS over 30 min. Wash procedures willgenerally employ at least two final wash steps. Additional variations onthese conditions will be readily apparent to those skilled in the art(see, for example, US Patent Application No. 20010010913).

If desired, one may employ wash steps of even greater stringency,including conditions of 65° C.-68° C. in a solution of 15 mM NaCl, 1.5mM trisodium citrate, and 0.1% SDS, or about 0.2×SSC, 0.1% SDS at 65° C.and washing twice, each wash step of 10, 20 or 30 min in duration, orabout 0.1×SSC, 0.1% SDS at 65° C. and washing twice for 10, 20 or 30min. Hybridization stringency may be increased further by using the sameconditions as in the hybridization steps, with the wash temperatureraised about 3° C. to about 5° C., and stringency may be increased evenfurther by using the same conditions except the wash temperature israised about 6° C. to about 9° C.

Target Nucleic Acids of the Present Disclosure

The recombinant TET-like proteins (e.g. TET1-like proteins) of thepresent disclosure may be targeted to specific target nucleic acids toinduce demethylation of the target nucleic acid. In some embodiments,TET-like proteins (e.g. TET1-like proteins) are targeted to a specificnucleic acid via a heterologous DNA-binding domain. In some embodiments,TET-like proteins (e.g. TET1-like proteins) reduce methylation of atarget nucleic acid by being targeted to the nucleic acid by a guideRNA. In this sense, a target nucleic acid of the present disclosure istargeted based on the particular nucleotide sequence in the targetnucleic acid that is recognized by the targeting portion of a TET-likepolypeptide such as a TET1-like polypeptide (e.g. DNA-binding domain orguide RNA).

In some embodiments, a target nucleic acid of the present disclosure isa nucleic acid that is located at any location within a target gene thatprovides a suitable location for reducing methylation of the targetgene. The target nucleic acid may be located within the coding region ofa target gene or upstream or downstream thereof. Moreover, the targetnucleic acid may reside endogenously in a target gene or may be insertedinto the gene, e.g., heterologous, for example, using techniques such ashomologous recombination. For example, a target gene of the presentdisclosure can be operably linked to a control region, such as apromoter, that contains a sequence that can be recognized by e.g. acrRNA/tracrRNA and/or a guide RNA of the present disclosure such thatrecombinant TET-like proteins (e.g. TET1-like proteins) of the presentdisclosure are targeted to that sequence. Also, the target nucleic acidmay be one that is able to be bound by a DNA-binding domain that isrecombinantly fused to a TET-like protein (e.g. TET1-like protein) ofthe present disclosure.

In some embodiments, the target nucleic acid is endogenous to the plantwhere the expression of one or more genes is modulated by a TET-likeprotein (e.g. TET1-like protein) as a result of reduced methylation atthe target nucleic acid as facilitated by the TET-like protein (e.g.TET1-like protein). In some embodiments, the target nucleic acid is atransgene of interest that has been inserted into a plant. Methods ofintroducing transgenes into plants are well known in the art. Transgenesmay be inserted into plants in order to provide a production system fora desired protein, or may be added to the genetic compliment in order tomodulate the metabolism of a plant. In some embodiments, the expressionof a target nucleic acid is increased as a consequence of the methods ofthe present disclosure using TET-like proteins (e.g. TET1-likeproteins).

Suitable target nucleic acids will be readily apparent to one of skillin the art depending on the particular need or outcome. The targetnucleic acid may be in e.g. a region of euchromatin (e.g. highlyexpressed gene), or the target nucleic acid may be in a region ofheterochromatin (e.g. centromere DNA). Use of TET-like proteins (e.g.TET1-like proteins) as described herein to target demethylation andtranscript activation in a region of heterochromatin or other highlymethylated region of a plant genome may be especially useful in certainresearch embodiments. For example, use of TET1-like proteins todemethylate and activate a retrotransposon in a plant genome may finduse in inducing mutagenesis of other genomic regions in that genome.

In some embodiments, a target nucleic acid may have its expressiondownregulated/reduced, or silenced, by a TET-like protein (e.g.TET1-like protein) according to the methods of the present disclosure.The particular nature of the target nucleic acid, and the role thatmethylation of that nucleic acid plays with respect to expression ofthat target nucleic acid, are factors that may govern whether aparticular target nucleic acid may have its expression increased ordecreased as compared to a corresponding control nucleic acid accordingto the methods of the present disclosure. Reduction in methylation of atarget nucleic acid may lead to increased expression, or reduction inmethylation may lead to decreased expression, as compared to acorresponding control.

Plants of the Present Disclosure

Certain aspects of the present disclosure relate to plants containingTET-like proteins (e.g. TET1-like proteins) that are targeted to one ormore target nucleic acids in the plant and reduce the methylation levelof the one or more target nucleic acids.

As used herein, a “plant” refers to any of various photosynthetic,eukaryotic multi-cellular organisms of the kingdom Plantae,characteristically producing embryos, containing chloroplasts, havingcellulose cell walls and lacking locomotion. As used herein, a “plant”includes any plant or part of a plant at any stage of development,including seeds, suspension cultures, plant cells, embryos, meristematicregions, callus tissue, leaves, roots, shoots, gametophytes,sporophytes, pollen, microspores, and progeny thereof. Also included arecuttings, and cell or tissue cultures. As used in conjunction with thepresent disclosure, plant tissue includes, for example, whole plants,plant cells, plant organs, e.g., leafs, stems, roots, meristems, plantseeds, protoplasts, callus, cell cultures, and any groups of plant cellsorganized into structural and/or functional units.

Any plant cell may be used in the present disclosure so long as itremains viable after being transformed with a sequence of nucleic acids.Preferably, the plant cell is not adversely affected by the transductionof the necessary nucleic acid sequences, the subsequent expression ofthe proteins or the resulting intermediates.

As disclosed herein, a broad range of plant types may be modified toincorporate an TET1-like protein of the present disclosure. Suitableplants that may be modified include both monocotyledonous (monocot)plants and dicotyledonous (dicot) plants.

Examples of suitable plants may include, for example, species of theFamily Gramineae, including Sorghum bicolor and Zea mays; species of thegenera: Cucurbita, Rosa, Vitis, Juglans, Fragaria, Lotus, Medicago,Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium,Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa,Capsicum, Datura, Hyoscyamus, Lycopersicon, Nicotiana, Solanum, Petunia,Digitalis, Majorana, Ciahorium, Helianthus, Lactuca, Bromus, Asparagus,Antirrhinum, Heterocallis, Nemesis, Pelargonium, Panieum, Pennisetum,Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Pisum,Phaseolus, Lolium, Oryza, Avena, Hordeum, Secale, and Triticum.

In some embodiments, plant cells may include, for example, those fromcorn (Zea mays), canola (Brassica napus, Brassica rapa ssp.), Brassicaspecies useful as sources of seed oil, alfalfa (Medicago sativa), rice(Oryza sativa), rye (Secale cereale), Sorghum (Sorghum bicolor, Sorghumvulgare), millet (e.g., pearl millet (Pennisetum glaucum), proso millet(Panicum miliaceum), foxtail millet (Setaria italica), finger millet(Eleusine coracana)), sunflower (Helianthus annuus), safflower(Carthamus tinctorius), wheat (Triticum aestivum), duckweed (Lemna),soybean (Glycine max), tobacco (Nicotiana tabacum), potato (Solanumtuberosum), peanuts (Arachis hypogaea), cotton (Gossypium barbadense,Gossypium hirsutum), sweet potato (Ipomoea batatus), cassava (Manihotesculenta), coffee (Coffea spp.), coconut (Cocos nucijra), pineapple(Ananas comosus), Citrus trees (Citrus spp.), cocoa (Theobroma cacao),tea (Camellia sinensis), banana (Musa spp.), avocado (Persea americana),fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica),olive (Olea europaea), papaya (Carica papaya), cashew (Anacardiumoccidentale), Macadamia (Macadamia spp.), almond (Prunus amygdalus),sugar beets (Beta vulgaris), sugarcane (Saccharum spp.), oats, barley,vegetables, ornamentals, and conifers.

Examples of suitable vegetables plants may include, for example,tomatoes (Lycopersicon esculentum), lettuce (e.g., Lactuca sativa),green beans (Phaseolus vulgaris), lima beans (Phaseolus limensis), peas(Lathyrus spp.), and members of the genus Cucumis such as cucumber (C.sativus), cantaloupe (C. cantalupensis), and musk melon (C. melo).

Examples of suitable ornamental plants may include, for example, azalea(Rhododendron spp.), hydrangea (Macrophylla hydrangea), Hibiscus(Hibiscus rosasanensis), roses (Rosa spp.), tulips (Tulipa spp.),daffodils (Narcissus spp.), petunias (Petunia hybrida), carnation(Dianthus caryophyllus), poinsettia (Euphorbia pulcherrima), andchrysanthemum.

Examples of suitable conifer plants may include, for example, loblollypine (Pinus taeda), slash pine (Pinus elliotii), ponderosa pine (Pinusponderosa), lodgepole pine (Pinus contorta), Monterey pine (Pinusradiata), Douglas-fir (Pseudotsuga menziesii), Western hemlock (Isugacanadensis), Sitka spruce (Picea glauca), redwood (Sequoiasempervirens), silver fir (Abies amabilis), balsam fir (Abies balsamea),Western red cedar (Thuja plicata), and Alaska yellow-cedar(Chamaecyparis nootkatensis).

Examples of suitable leguminous plants may include, for example, guar,locust bean, fenugreek, soybean, garden beans, cowpea, mungbean, limabean, fava bean, lentils, chickpea, peanuts (Arachis sp.), crown vetch(Vicia sp.), hairy vetch, adzuki bean, lupine (Lupinus sp.), trifolium,common bean (Phaseolus sp.), field bean (Pisum sp.), clover (Melilotussp.) Lotus, trefoil, lens, and false indigo.

Examples of suitable forage and turf grass may include, for example,alfalfa (Medicago s sp.), orchard grass, tall fescue, perennialryegrass, creeping bent grass, and redtop.

Examples of suitable crop plants and model plants may include, forexample, Arabidopsis, corn, rice, alfalfa, sunflower, canola, soybean,cotton, peanut, Sorghum, wheat, tobacco, and lemna.

The plants of the present disclosure may be genetically modified in thatrecombinant nucleic acids have been introduced into the plants, and assuch the genetically modified plants do not occur in nature. A suitableplant of the present disclosure is one capable of expressing one or morenucleic acid constructs encoding one or more recombinant proteins. Therecombinant proteins encoded by the nucleic acids may be e.g. TET1-likeproteins.

As used herein, the terms “transgenic plant” and “genetically modifiedplant” are used interchangeably and refer to a plant which containswithin its genome a recombinant nucleic acid. Generally, the recombinantnucleic acid is stably integrated within the genome such that thepolynucleotide is passed on to successive generations. However, incertain embodiments, the recombinant nucleic acid is transientlyexpressed in the plant. The recombinant nucleic acid may be integratedinto the genome alone or as part of a recombinant expression cassette.“Transgenic” is used herein to include any cell, cell line, callus,tissue, plant part or plant, the genotype of which has been altered bythe presence of exogenous nucleic acid including those transgenicsinitially so altered as well as those created by sexual crosses orasexual propagation from the initial transgenic.

“Recombinant nucleic acid” or “heterologous nucleic acid” or“recombinant polynucleotide” as used herein refers to a polymer ofnucleic acids wherein at least one of the following is true: (a) thesequence of nucleic acids is foreign to (i.e., not naturally found in) agiven host cell; (b) the sequence may be naturally found in a given hostcell, but in an unnatural (e.g., greater than expected) amount; or (c)the sequence of nucleic acids contains two or more subsequences that arenot found in the same relationship to each other in nature. For example,regarding instance (c), a recombinant nucleic acid sequence will havetwo or more sequences from unrelated genes arranged to make a newfunctional nucleic acid. Specifically, the present disclosure describesthe introduction of an expression vector into a plant cell, where theexpression vector contains a nucleic acid sequence coding for a proteinthat is not normally found in a plant cell or contains a nucleic acidcoding for a protein that is normally found in a plant cell but is underthe control of different regulatory sequences. With reference to theplant cell's genome, then, the nucleic acid sequence that codes for theprotein is recombinant. A protein that is referred to as recombinantgenerally implies that it is encoded by a recombinant nucleic acidsequence which may be present in the plant cell. Recombinant proteins ofthe present disclosure may also be exogenously supplied directly to hostcells (e.g. plant cells).

A “recombinant” polypeptide, protein, or enzyme of the presentdisclosure, is a polypeptide, protein, or enzyme that is encoded by a“recombinant nucleic acid” or “heterologous nucleic acid” or“recombinant polynucleotide.”

In some embodiments, the genes encoding the recombinant proteins in theplant cell may be heterologous to the plant cell. In certainembodiments, the plant cell does not naturally produce the recombinantproteins, and contains heterologous nucleic acid constructs capable ofexpressing one or more genes necessary for producing those molecules. Incertain embodiments, the plant cell does not naturally produce one ormore polypeptides of the present disclosure, and is provided the one ormore polypeptides through exogenous delivery of the polypeptidesdirectly to the plant cell without the need to express a recombinantnucleic acid encoding the recombinant polypeptide in the plant cell.

Recombinant nucleic acids and/or recombinant proteins of the presentdisclosure may be present in host cells (e.g. plant cells). In someembodiments, recombinant nucleic acids are present in an expressionvector, and the expression vector may be present in host cells (e.g.plant cells).

Expression of Recombinant Proteins in Plants

A TET-like protein (e.g. TET1-like protein) of the present disclosuremay be introduced into plant cells via any suitable methods known in theart. For example, a TET-like protein (e.g. TET1-like protein) can beexogenously added to plant cells and the plant cells are maintainedunder conditions such that the TET-like protein (e.g. TET1-like protein)is targeted to one or more target nucleic acids and reduces themethylation of the target nucleic acids in the plant cells.Alternatively, a recombinant nucleic acid encoding a TET-like protein(e.g. TET1-like protein) of the present disclosure can be expressed inplant cells and the plant cells are maintained under conditions suchthat the TET-like protein (e.g. TET1-like protein) of the presentdisclosure is targeted to one or more target nucleic acids and reducesthe methylation of the target gene in the plant cells. Additionally, insome embodiments, a TET-like protein (e.g. TET1-like protein) of thepresent disclosure may be transiently expressed in a plant via viralinfection of the plant, or by introducing a TET-like (e.g. TET1-like)protein-encoding RNA into a plant to reduce the methylation of a targetnucleic acid of interest. Methods of introducing recombinant proteinsvia viral infection or via the introduction of RNAs into plants are wellknown in the art. For example, Tobacco rattle virus (TRV) has beensuccessfully used to introduce zinc finger nucleases in plants to causegenome modification (“Nontransgenic Genome Modification in Plant Cells”,Plant Physiology 154:1079-1087 (2010)).

A recombinant nucleic acid encoding a TET-like protein (e.g. TET1-likeprotein) of the present disclosure can be expressed in a plant with anysuitable plant expression vector. Typical vectors useful for expressionof recombinant nucleic acids in higher plants are well known in the artand include, for example, vectors derived from the tumor-inducing (Ti)plasmid of Agrobacterium tumefaciens (e.g., see Rogers et al., Meth. inEnzymol. (1987) 153:253-277). These vectors are plant integratingvectors in that on transformation, the vectors integrate a portion ofvector DNA into the genome of the host plant. Exemplary A. tumefaciensvectors useful herein are plasmids pKYLX6 and pKYLX7 (e.g., see ofSchardl et al., Gene (1987) 61:1-11; and Berger et al., Proc. Natl.Acad. Sci. USA (1989) 86:8402-8406); and plasmid pBI 101.2 that isavailable from Clontech Laboratories, Inc. (Palo Alto, Calif.).

In addition to regulatory domains, a TET-like protein (e.g. TET1-likeprotein) of the present disclosure can be expressed as a fusion proteinthat is coupled to, for example, a maltose binding protein (“MBP”),glutathione S transferase (GST), hexahistidine, c-myc, or the FLAGepitope for ease of purification, monitoring expression, or monitoringcellular and subcellular localization.

Moreover, a recombinant nucleic acid encoding a TET-like protein (e.g.TET1-like protein) of the present disclosure can be modified to improveexpression of the recombinant protein in plants by using codonpreference. When the recombinant nucleic acid is prepared or alteredsynthetically, advantage can be taken of known codon preferences of theintended plant host where the nucleic acid is to be expressed. Forexample, recombinant nucleic acids of the present disclosure can bemodified to account for the specific codon preferences and GC contentpreferences of monocotyledons and dicotyledons, as these preferenceshave been shown to differ (Murray et al., Nucl. Acids Res. (1989) 17:477-498).

In some embodiments, a TET-like protein (e.g. TET1-like protein) of thepresent disclosure can be used to create functional “overexpression”mutations in a plant by releasing repression of the target geneexpression as a consequence of the reduced methylation of the targetnucleic acid. Release of gene expression repression, which may lead toactivation of gene expression, may be of a structural gene, e.g., oneencoding a protein having for example enzymatic activity, or of aregulatory gene, e.g., one encoding a protein that in turn regulatesexpression of a structural gene.

The present disclosure further provides expression vectors encodingTET-like proteins (e.g. TET1-like proteins). A nucleic acid sequencecoding for the desired recombinant nucleic acid of the presentdisclosure can be used to construct a recombinant expression vectorwhich can be introduced into the desired host cell. A recombinantexpression vector will typically contain a nucleic acid encoding arecombinant protein of the present disclosure, operably linked totranscriptional initiation regulatory sequences which will direct thetranscription of the nucleic acid in the intended host cell, such astissues of a transformed plant.

For example, plant expression vectors may include (1) a cloned geneunder the transcriptional control of 5′ and 3′ regulatory sequences and(2) a dominant selectable marker. Such plant expression vectors may alsocontain, if desired, a promoter regulatory region (e.g., one conferringinducible or constitutive, environmentally- ordevelopmentally-regulated, or cell- or tissue-specific/selectiveexpression), a transcription initiation start site, a ribosome bindingsite, an RNA processing signal, a transcription termination site, and/ora polyadenylation signal.

A plant promoter, or functional fragment thereof, can be employed tocontrol the expression of a recombinant nucleic acid of the presentdisclosure in regenerated plants. The selection of the promoter used inexpression vectors will determine the spatial and temporal expressionpattern of the recombinant nucleic acid in the modified plant, e.g., thenucleic acid encoding the TET-like protein (e.g. TET1-like protein) ofthe present disclosure is only expressed in the desired tissue or at acertain time in plant development or growth. Certain promoters willexpress recombinant nucleic acids in all plant tissues and are activeunder most environmental conditions and states of development or celldifferentiation (i.e., constitutive promoters). Other promoters willexpress recombinant nucleic acids in specific cell types (such as leafepidermal cells, mesophyll cells, root cortex cells) or in specifictissues or organs (roots, leaves or flowers, for example) and theselection will reflect the desired location of accumulation of the geneproduct. Alternatively, the selected promoter may drive expression ofthe recombinant nucleic acid under various inducing conditions.

Examples of suitable constitutive promoters may include, for example,the core promoter of the Rsyn7, the core CaMV 35S promoter (Odell etal., Nature (1985) 313:810-812), CaMV 19S (Lawton et al., 1987), riceactin (Wang et al., 1992; U.S. Pat. No. 5,641,876; and McElroy et al.,Plant Cell (1985) 2:163-171); ubiquitin (Christensen et al., Plant Mol.Biol. (1989)12:619-632; and Christensen et al., Plant Mol. Biol. (1992)18:675-689), pEMU (Last et al., Theor. Appl. Genet. (1991) 81:581-588),MAS (Velten et al., EMBO J. (1984) 3:2723-2730), nos (Ebert et al.,1987), Adh (Walker et al., 1987), the P- or 2′-promoter derived fromT-DNA of Agrobacterium tumefaciens, the Smas promoter, the cinnamylalcohol dehydrogenase promoter (U.S. Pat. No. 5,683,439), the Nospromoter, the pEmu promoter, the rubisco promoter, the GRP 1-8 promoter,and other transcription initiation regions from various plant genesknown to those of skilled artisans, and constitutive promoters describedin, for example, U.S. Pat. Nos. 5,608,149; 5,608,144; 5,604,121;5,569,597; 5,466,785; 5,399,680; 5,268,463; and 5,608,142. In someembodiments, expression of a nucleic acid of the present disclosure(e.g. a nucleic acid encoding a TET1-like protein) may be driven (inoperable linkage) with a UBQ10 promoter.

Examples of suitable tissue specific promoters may include, for example,the lectin promoter (Vodkin et al., 1983; Lindstrom et al., 1990), thecorn alcohol dehydrogenase 1 promoter (Vogel et al., 1989; Dennis etal., 1984), the corn light harvesting complex promoter (Simpson, 1986;Bansal et al., 1992), the corn heat shock protein promoter (Odell etal., Nature (1985) 313:810-812; Rochester et al., 1986), the pea smallsubunit RuBP carboxylase promoter (Poulsen et al., 1986; Cashmore etal., 1983), the Ti plasmid mannopine synthase promoter (Langridge etal., 1989), the Ti plasmid nopaline synthase promoter (Langridge et al.,1989), the petunia chalcone isomerase promoter (Van Tunen et al., 1988),the bean Glycine rich protein 1 promoter (Keller et al., 1989), thetruncated CaMV 35s promoter (Odell et al., Nature (1985) 313:810-812),the potato patatin promoter (Wenzler et al., 1989), the root cellpromoter (Conkling et al., 1990), the maize zein promoter (Reina et al.,1990; Kriz et al., 1987; Wandelt and Feix, 1989; Langridge and Feix,1983; Reina et al., 1990), the globulin-1 promoter (Belanger and Kriz etal., 1991), the α-tubulin promoter, the cab promoter (Sullivan et al.,1989), the PEPCase promoter (Hudspeth & Grula, 1989), the R genecomplex-associated promoters (Chandler et al., 1989), and the chalconesynthase promoters (Franken et al., 1991).

Alternatively, the plant promoter can direct expression of a recombinantnucleic acid of the present disclosure in a specific tissue or may beotherwise under more precise environmental or developmental control.Such promoters are referred to here as “inducible” promoters.Environmental conditions that may affect transcription by induciblepromoters include, for example, pathogen attack, anaerobic conditions,or the presence of light. Examples of inducible promoters include, forexample, the AdhI promoter which is inducible by hypoxia or cold stress,the Hsp70 promoter which is inducible by heat stress, and the PPDKpromoter which is inducible by light. Examples of promoters underdevelopmental control include, for example, promoters that initiatetranscription only, or preferentially, in certain tissues, such asleaves, roots, fruit, seeds, or flowers. An exemplary promoter is theanther specific promoter 5126 (U.S. Pat. Nos. 5,689,049 and 5,689,051).The operation of a promoter may also vary depending on its location inthe genome. Thus, an inducible promoter may become fully or partiallyconstitutive in certain locations.

Moreover, any combination of a constitutive or inducible promoter, and anon-tissue specific or tissue specific promoter may be used to controlthe expression of a TET-like protein (e.g. TET1-like protein) of thepresent disclosure.

The recombinant nucleic acids of the present disclosure and/or a vectorhousing a recombinant nucleic acid of the present disclosure, may alsocontain a regulatory sequence that serves as a 3′ terminator sequence.One of skill in the art would readily recognize a variety of terminatorsthat may be used in the recombinant nucleic acids of the presentdisclosure. For example, a recombinant nucleic acid of the presentdisclosure may contain a 3′ NOS terminator. Further, a native terminatorfrom a TET protein (e.g. a TET1 protein) of the present disclosure mayalso be used in the recombinant nucleic acids of the present disclosure.

Plant transformation protocols as well as protocols for introducingrecombinant nucleic acids of the present disclosure into plants may varydepending on the type of plant or plant cell, e.g., monocot or dicot,targeted for transformation. Suitable methods of introducing recombinantnucleic acids of the present disclosure into plant cells and subsequentinsertion into the plant genome include, for example, microinjection(Crossway et al., Biotechniques (1986) 4:320-334), electroporation(Riggs et al., Proc. Natl. Acad Sci. USA (1986) 83:5602-5606),Agrobacterium-mediated transformation (U.S. Pat. No. 5,563,055), directgene transfer (Paszkowski et al., EMBO J. (1984) 3:2717-2722), andballistic particle acceleration (U.S. Pat. No. 4,945,050; Tomes et al.(1995). “Direct DNA Transfer into Intact Plant Cells via MicroprojectileBombardment,” in Plant Cell, Tissue, and Organ Culture: FundamentalMethods, ed. Gamborg and Phillips (Springer-Verlag, Berlin); and McCabeet al., Biotechnology (1988) 6:923-926).

Additionally, a TET-like protein (e.g. TET1-like protein) of the presentdisclosure can be targeted to a specific organelle within a plant cell.Targeting can be achieved by providing the recombinant protein with anappropriate targeting peptide sequence. Examples of such targetingpeptides include, for example, secretory signal peptides (for secretionor cell wall or membrane targeting), plastid transit peptides,chloroplast transit peptides, mitochondrial target peptides, vacuoletargeting peptides, nuclear targeting peptides, and the like (e.g., seeReiss et al., Mol. Gen. Genet. (1987) 209(1):116-121; Settles andMartienssen, Trends Cell Biol (1998) 12:494-501; Scott et al., J BiolChem (2000) 10:1074; and Luque and Correas, J Cell Sci (2000)113:2485-2495).

The modified plant may be grown into plants in accordance withconventional ways (e.g., see McCormick et al., Plant Cell. Reports(1986) 81-84). These plants may then be grown, and pollinated witheither the same transformed strain or different strains, with theresulting hybrid having the desired phenotypic characteristic. Two ormore generations may be grown to ensure that the subject phenotypiccharacteristic is stably maintained and inherited and then seedsharvested to ensure the desired phenotype or other property has beenachieved.

Methods of Reducing Methylation of a Target Nucleic Acid in Plants

Growing conditions sufficient for the recombinant TET-like polypeptides(e.g. TET1-like polypeptides) of the present disclosure to be expressedin the plant to be targeted to and reduce the methylation of one or moretarget nucleic acids of the present disclosure are well known in the artand include any suitable growing conditions disclosed herein. Typically,the plant is grown under conditions sufficient to express a recombinantpolypeptide of the present disclosure (e.g. TET1-like proteins), and forthe expressed recombinant polypeptide to be localized to the nucleus ofcells of the plant in order to be targeted to and reduce the methylationof the target nucleic acids. Generally, the conditions sufficient forthe expression of the recombinant polypeptide will depend on thepromoter used to control the expression of the recombinant polypeptide.For example, if an inducible promoter is utilized, expression of therecombinant polypeptide in a plant will require that the plant to begrown in the presence of the inducer.

As noted above, growing conditions sufficient for the recombinantpolypeptides of the present disclosure to be expressed in the plant tobe targeted to and reduce methylation and/or activate or reduce theexpression of one or more target nucleic acids may vary depending on anumber of factors (e.g. species of plant, use of inducible promoter,etc.). Suitable growing conditions may include, for example, ambientenvironmental conditions, standard greenhouse conditions, growth in longdays under standard environmental conditions (e.g. 16 hours of light, 8hours of dark), growth in 12 hour light:12 hour dark day/night cycles,etc.

Various time frames may be used to observe activation in expressionand/or targeted demethylation of a target nucleic acid according to themethods of the present disclosure. Plants may be observed/assayed foractivation in expression and/or targeted demethylation of a targetnucleic acid after, for example, about 5 days of growth, about 10 daysof growth, about 15 days after growth, about 20 days after growth, about25 days after growth, about 30 days after growth, about 35 days aftergrowth, about 40 days after growth, about 50 days after growth, or 55days or more of growth.

Reduced methylation of a target nucleic acid induced by targeting aTET-like protein (e.g. TET1-like protein) to the target nucleic acid maybe stable in plants even in the absence of the TET-like protein (e.g.TET1-like protein) in the plant. Accordingly, the methods of the presentdisclosure may allow one or more target nucleic acids in a plant tomaintain a reduced level of methylation after a nucleic acid encoding aTET-like protein (e.g. TET1-like protein) has been crossed out orotherwise removed from the plant. For example, after targeting aparticular genomic region with a TET-like protein (e.g. TET1-likeprotein) protein according to the methods of the present disclosure, thereduced level of methylation of the targeted region may remain stableeven after crossing away the transgenes. It is an object of the presentdisclosure to provide plants having reduced methylation of one or moretarget nucleic acids according to the methods of the present disclosure.As the methods of the present disclosure may allow one or more targetnucleic acids in a plant to remain in their state of reduced methylationafter a recombinant polynucleotide encoding a TET-like protein (e.g.TET1-like protein) of the present disclosure has been crossed out of theplant, the progeny plants of these plants may have reduced methylationof one or more target nucleic acids even in the absence of therecombinant polynucleotides that produce the recombinant polypeptides ofthe present disclosure.

A target nucleic acid of the present disclosure in a plant cell housinga TET-like protein (e.g. TET1-like protein) of the present disclosuremay have its level of methylation reduced by at least about 5%, at leastabout 10%, at least about 15%, at least about 20%, at least about 25%,at least about 30%, at least about 40%, at least about 50%, at leastabout 55%, at least about 60%, at least about 65%, at least about 70%,at least about 75%, at least about 80%, at least about 85%, at leastabout 90%, at least about 91%, at least about 92%, at least about 93%,at least about 94%, at least about 95%, at least about 96%, at leastabout 97%, at least about 98%, at least about 99%, or at least about100% as compared to a corresponding control. Various controls will bereadily apparent to one of skill in the art. For example, a control maybe a corresponding plant or plant cell that does not contain a nucleicacid encoding a TET-like protein (e.g. TET1-like protein) of the presentdisclosure.

A target nucleic acid of the present disclosure having reducedmethylation as compared to a corresponding control nucleic acid mayexhibit a reduction in methylation over a number of nucleotidesincluding and adjacent to the targeted nucleotide sequences in a targetnucleic acid. For example, the reduction in methylation may be presentover one nucleotide, over about 5 nucleotides, over about 10nucleotides, over about 15 nucleotides, over about 20 nucleotides, overabout 25 nucleotides, over about 30 nucleotides, over about 35nucleotides, over about 40 nucleotides, over about 45 nucleotides, overabout 50 nucleotides, over about 55 nucleotides, over about 60nucleotides, over about 75 nucleotides, over about 100 nucleotides, overabout 125 nucleotides, over about 150 nucleotides, over about 175nucleotides, over about 200 nucleotides, over about 225 nucleotides,over about 250 nucleotides, over about 275 nucleotides, over about 300nucleotides, over about 350 nucleotides, over about 400 nucleotides,over about 450 nucleotides, over about 500 nucleotides, over about 600nucleotides, over about 700 nucleotides, over about 800 nucleotides,over about 900 nucleotides, over about 1,000 nucleotides, over about1,500 nucleotides, over about 2,000 nucleotides, over about 2,500nucleotides, or over about 3,000 nucleotides or more as compared tocorresponding nucleotides in a corresponding control nucleic acid. Thereduction in methylation of nucleotides adjacent to the targetnucleotides in the target nucleic acid may occur in nucleotides that are5′ to the target nucleotide sequences, 3′ to the target nucleotidessequences, or both 5′ and 3′ to the target nucleotide sequences.

A target nucleic acid of the present disclosure may have its expressionupregulated/activated as compared to a corresponding control nucleicacid. A target nucleic acid may have its expression upregulated at leastabout 1-fold, at least about 2-fold, at least about 3-fold, at leastabout 4-fold, at least about 5-fold, at least about 10-fold, at leastabout 15-fold, at least about 20-fold, at least about 25-fold, at leastabout 30-fold, at least about 40-fold, at least about 50-fold, at leastabout 75-fold, at least about 100-fold, at least about 150-fold, atleast about 200-fold, at least about 300-fold, at least about 400-fold,at least about 500-fold, at least about 600-fold, at least about700-fold, at least about 800-fold, at least about 900-fold, at leastabout 1,000-fold, at least about 1,250-fold, at least about 1,500-fold,at least about 1,750-fold, at least about 2,000-fold, at least about2,500-fold, at least about 3,000-fold, at least about 3,500-fold or moreas compared to a corresponding control nucleic acid. As stated above,various controls will be readily apparent to one of skill in the art.For example, a control nucleic acid may be a corresponding nucleic acidfrom a plant or plant cell that does not contain a nucleic acid encodinga TET-like protein (e.g. TET1-like protein) of the present disclosure.

A target nucleic acid of the present disclosure may have its expressiondownregulated/reduced, or silenced, as compared to a correspondingcontrol nucleic acid. A target nucleic acid may have its expressionreduced by at least about 1-fold, at least about 2-fold, at least about3-fold, at least about 4-fold, at least about 5-fold, at least about10-fold, at least about 15-fold, at least about 20-fold, at least about25-fold, at least about 30-fold, at least about 40-fold, at least about50-fold, at least about 75-fold, at least about 100-fold, at least about150-fold, at least about 200-fold, at least about 300-fold, at leastabout 400-fold, at least about 500-fold, at least about 600-fold, atleast about 700-fold, at least about 800-fold, at least about 900-fold,at least about 1,000-fold, at least about 1,250-fold, at least about1,500-fold, at least about 1,750-fold, at least about 2,000-fold, atleast about 2,500-fold, at least about 3,000-fold, at least about3,500-fold or more as compared to a corresponding control nucleic acid.As stated above, various controls will be readily apparent to one ofskill in the art. For example, a control nucleic acid may be acorresponding nucleic acid from a plant or plant cell that does notcontain a nucleic acid encoding a TET-like protein (e.g. TET1-likeprotein) of the present disclosure.

Methods of probing the methylation status of a nucleic acid arewell-known to those of skill in the art. For example, bisulfitesequencing and nucleic acid analysis may be used to determine themethylation status, on a nucleotide-by-nucleotide basis, of a populationof nucleic acids isolated from a nucleic acid-containing sample (e.g.plants, plant tissues, or plant cells).

It is to be understood that while the present disclosure has beendescribed in conjunction with the preferred specific embodimentsthereof, the foregoing description is intended to illustrate and notlimit the scope of the present disclosure. Other aspects, advantages,and modifications within the scope of the present disclosure will beapparent to those skilled in the art to which the present disclosurepertains.

EXAMPLES

The following examples are offered to illustrate provided embodimentsand are not intended to limit the scope of the present disclosure.

Example 1: DNA-Binding Domain-Targeting of Demethylation Factor TET1(Catalytic Domain) to the FWA Locus in Arabidopsis

This Example demonstrates the targeting of the catalytic domain of aTET1 protein to a specific locus to cause DNA demethylation in plants.

Introduction

DNA methylation controls gene expression in many different organisms,including plants. Applicant has previously shown that artificial zincfingers (AZF) can be used for targeted methylation and repression ofgene expression in Arabidopsis (Johnson et al, 2014). Using the sameAZF, ZF108, this Example demonstrates targeted DNA demethylation inArabidopsis. To do so, the catalytic domain of TET1, a protein involvedin DNA demethylation in mammals (Ito et al, 2011, Gue et al, 2011) washeterologously fused to ZF108.

The TET1 catalytic domain has been shown to cause DNA demethylation inother organisms when artificially targeted to genomic locations usingArtificial Zinc Fingers, Tal effectors and CRISPR/Cas9. However, such amethod has not been shown to work in plants. Moreover, given that TET1is not a native plant protein and given that plant DNA methylation is inmany ways different from animal DNA methylation, it was not known thatsuch a method could even work in plants.

In the present Example, Applicant fused the catalytic domain of TET1 tothe C-terminal tail of ZF108 and expressed this fusion protein under thecontrol of the constitutive promoter UBQ10 in wild-type Arabidopsisplants. TET1 catalytic domain was amplified from the pJFA334E9 plasmidprovided by the Joung lab through Addgene. Importantly, ZF108 wasdesigned to bind to the promoter of the reporter gene FWA in Arabidopsis(Johnson et al, 2013). In wild-type plants, this gene is repressed dueto DNA methylation in its promoter. Absence of methylation causes FWAoverexpression and an associated late flowering phenotype. Therefore,wild-type plants expressing the chimeric protein ZF108-TET1 (catalyticdomain) were screened for a late flowering phenotype, indicative of FWAoverexpression and a likely consequence of promoter de-methylation. Fromthis screen, Applicant identified plants exhibiting a late floweringphenotype as compared to wild-type plants. Following identification ofthese late flowering plants, their DNA was extracted and digested withthe methylation-sensitive restriction enzyme McrBC. The resultsdemonstrated that plants expressing ZF108-TET1 (catalytic domain) hadlow methylation at the FWA promoter compared to wild type. Whole-genomeBisulfite Sequencing was performed to analyze the impact of ZF108-TET1(catalytic domain) genome-wide. Finally, gene expression of the samesamples was analyzed by RNA-seq in order to observe potential changes ingene expression due to demethylation.

Materials and Methods

Cloning of pUBQ10::ZF_3×Flag_TET1-CD

For this purpose, a modified pMDC123 plasmid (Curtis et al, 2003, PlantPhys) was created first, containing 1990 bp of the promoter region ofArabidopsis UBQ10 gene upstream of the BLRP_ZF108_3×Flag cassette. BothUBQ10 promoter and BLRP_ZF108_3×Flag are upstream of the gatewaycassette (Invitrogen) present in the original pMDC123 plasmid. Thecatalytic domain of the human TET1 protein (TET1-CD) was amplified fromthe plasmid pJFA334E9 (Addgene) and cloned into pENTR/D plasmid(Invitrogen) and then delivered into the modified pMDC123 by LR reaction(Invitrogen), creating an in-frame fusion of TET1_CD cDNA with theupstream BLRP_ZF108_3×Flag cassette.

The nucleotide sequence of pUBQ10::ZF108_3×Flag_TET1-CD is presented inSEQ ID NO: 21. This expression cassette contains a UBQ10 promoter (SEQID NO: 22), the ZF108 DNA-binding domain that targets the FWA promoter(SEQ ID NO: 23), a 3×Flag tag (SEQ ID NO: 24), the catalytic domain ofhuman TET1 (SEQ ID NO: 25), and an OCS terminator sequence (SEQ ID NO:26). The pUBQ10::ZF108_3×Flag_TET1-CD expression cassette encodes theZF108_3×Flag_TET1-CD fusion protein, whose amino acid sequence set forthin SEQ ID NO: 27. Polypeptides in the fusion protein include ZF108 (SEQID NO: 28), 3×Flag (SEQ ID NO: 29), and human TET1-CD (SEQ ID NO: 30).

Plant Transformation and Flowering Time Measurement

The construct above was introduced into Col-0 wild-type Arabidopsisthaliana plants using Agrobacterium-mediated transformation. T1transgenic plants were selected based on their resistance to BASTA.Following selection, plants were grown on soil under a long dayphotoperiod until the plants flowered. Flowering time was scored bymeasuring the number of rosette and caulinar leaves.

CHOP-PCR

Plant DNA was extracted following a CTAB-based protocol. 1 μg DNA wasdigested with the methylation sensitive enzyme McrBC for 4 h at 37° C.As a non-digested control, 1 μg of DNA was incubated for 4 h at 37° C.in digestion buffer without the enzyme. Quantitative Real-time PCR wasdone to amplify a region of the FWA promoter using the oligos(ttgggtttagtgtttacttg) (SEQ ID NO: 167) and (gaatgttgaatgggataaggta)(SEQ ID NO: 168). As a control region, the gene body of another gene wasanalyzed using the oligos (tgcaatttgtctgcttgctaatg) (SEQ ID NO: 169) and(tcatttataatggacgatgcc) (SEQ ID NO: 170). After PCR, the ratio ofdigested over non-digested DNA was calculated.

Bisulfite Sequencing and Data Analysis

BS-Seq libraries were generated as previously reported (Cokus et al.,2008) and all libraries were sequenced using the HiSeq 2000 platformfollowing manufacturer instructions (Illumina) at a length of 50 bp.Bisulfite-Seq (BS-Seq) reads were aligned to the TAIR10 version of theArabidopsis thaliana reference genome using BS-seeker. For BS-Seq, up to2 mismatches were allowed and only uniquely mapped reads were used.

RNA-seq

Raw reads in qseq format obtained from the sequencer were firstconverted to fastq format with a customized perl script. Read qualitywas controlled with FastQC (worldwide web:bioinformatics.babraham.ac.uk/projects/fastqc). High quality reads werethen aligned to hg19 reference genome using Tophat (Trapnell et al,2009) (v 2.0.13) by using ‘-no-coverage-search’ option, allowing up totwo mismatches and only keeping reads that mapped to one location.Essentially, reads were first mapped to TAIR10 gene annotation withknown splice junction. When reads did not map to the annotated genes,the reads were mapped to hg19 genome. The number of reads mapping togenes were calculated by HTseq (Anders et al., 2015) (v 0.5.4) withdefault parameters. Expression levels were determined by RPKM (reads perkilobase of exons per million aligned reads) in R using customizedscripts.

Results

To explore whether ZF108_TET1-CD would be able to trigger demethylationand reactivate the expression of FWA, wild-type Col-0 plants weretransformed with the ZF108_TET1-CD containing construct described above.Flowering time of T1 transgenic plants was assayed, and results arepresented below in Table 1A.

TABLE 1A Flowering Time Results Early Late Line flowering floweringZF108_TET1-CD 32 25

The results presented in Table 1A demonstrate that the catalytic domainof human TET1 fused to a zinc finger that targets the FWA locus canefficiently promote late flowering in wild-type plants. A more throughassessment of these results is presented in FIG. 1 . From FIG. 1 , it isseen that wild-type Col-0 plants exhibit their normal “early” floweringtime. In contrast, fwa mutants, which contain an epimutation in the FWApromoter that results in loss of methylation at the FWA promoter andconsequent activation/expression of FWA (a flowering time repressor),exhibit their canonical “late” flowering time phenotype. Interestingly,a number of plants carrying the ZF108_TET1-CD construct in the Col-0genetic background exhibited a “late” flowering phenotype more analogousto fwa mutants, suggesting that this construct can promote lateflowering in otherwise wild-type plants.

In order to analyze whether the late flowering phenotype of plantsharboring ZF108_TET1-CD as described in Table 1A was due tode-methylation of the FWA promoter, CHOP-PCR using the methylationsensitive enzyme McrBC was performed on DNA obtained from these plants.As shown in FIG. 2 , the digested/non-digested profile for the FWApromoter in plants harboring the ZF108_TET1-CD construct is similar tothe FWA promoter profile for fwa-4 plants, indicative of a lack of DNAmethylation at the FWA promoter. However, while fwa-4 plants showdemethylation also at the control region analyzed, the ZF108_TET1-CDlines show a profile at the control region that is similar to wild-typeplants, suggesting that demethylation is happening specifically at theFWA promoter.

To further investigate the loss of methylation at the FWA promoter thatappeared to be conferred by ZF108_TET1-CD, a whole-genome bisulfitesequencing assay was performed in four independent ZF108_TET1-CD linesthat showed the late flowering phenotype. Bisulfite sequencingexperiments were conducted as described above. The results, which arepresented in FIG. 3 and FIG. 4 , show that effective DNA demethylationwas achieved by targeting the TET1 catalytic domain to the FWA promoter.Importantly, this effect was specific to the FWA promoter, and othermethylated regions in the general vicinity of the targeted genomicregion were not affected (FIG. 4 ).

In order to determine if the late flowering phenotype observed in thedifferent ZF108_TET1-CD lines was due to the activation of FWAexpression, RNA-seq was performed with four independent T1 lines. Theresults presented in FIG. 5 show that FWA was upregulated in alltransgenic lines tested, while two control housekeeping genes remainedunaffected. The results with the ZF108_TET1-CD transgenic lines werecomparable to the results observed in fwa-4 plants, which are known toexhibit loss of methylation at the FWA promoter and have activatedexpression of FWA as compared to wild-type plants. Thus, the resultspresented in this Example demonstrate that specific targeting of theTET1 catalytic domain to a genomic region of interest can be used totarget demethylation and gene activation in plants in a very specificmanner.

Example 2: CRISPR-Targeting of a TET1 Polypeptide to Specific Loci

This Example describes exemplary experimental guidelines forconstructing fusion constructs containing TET1 polypeptides as disclosedherein fused to dCAS9 proteins. These constructs may be used to target aTET1 polypeptide to a specific locus of a plant genome using theCRISPR-CAS9 system to induce de-methylation of the target nucleic acid.This particular example describes exemplary constructs to target the FWAlocus.

Materials and Methods

Construction of TET1-CD Fusion Proteins and gRNA-fwa

For this purpose, a modified pMDC123 plasmid (Curtis et al, 2003, PlantPhys) will be created first. A fragment containing 1986 bp of thepromoter region of Arabidopsis UBQ10 gene will be cloned, followed by anomega RBC translational enhancer and then a human codon optimized dCAS9creating pMDC UBQ10_dCAS9_Gateway. An attL1 site followed by an HA tag,two nuclear localization signals (NLS), the catalytic domain of TET1protein (TET1-CD), and a attL2 site will be created through genesynthesis and inserted into pUC57 to create pUC57attL1_1×HA_2×NLS_TET1-CD_attL2. The 1×HA_2×NLS_TET1-CD will be deliveredinto pMDC UBQ10_dCAS9_Gateway by LR reaction (Invitrogen) creating anin-frame fusion of 1×HA_2×NLS_TET1-CD with the upstream dCAS9 cassettecreating pMDC UBQ10_dCAS9_1×HA_2×NLS_TET1-CD.

Three different gRNA expression cassettes, a gRNA cassette driven by aU6 promoter expressing a single gRNA, a tRNA-gRNA expression cassettedriven by a U6 promoter with two different gRNAs, and a tRNA-gRNAexpression cassette driven by a U6 promoter with four different gRNAswill be created by gene synthesis. Independent of each other, eachindividual gRNA system will be inserted at the PmeI restriction site ofpMDC UBQ10_dCAS9_1×HA_2×NLS_TET1-CD upstream of the UBQ10 promotercreating: pMDC U6_gRNA_UBQ10_dCAS9_1×HA_2×NLS_TET1-CD, pMDCU6_tRNA-gRNA×2_UBQ10_dCAS9_1×HA_2×NLS_TET1-CD, and pMDCU6_tRNA_gRNA×4_UBQ10_dCAS9_1×HA_2×NLS_TET1-CD.

The exemplary expression cassette of UBQ10_dCAS9_1×HA_2×NLS_TET1-CD willcontain a number of features. The nucleotide sequence of the expressioncassette is presented in SEQ ID NO: 31. This cassette includes a UBQ10promoter (SEQ ID NO: 32), an Omega RBC (SEQ ID NO: 33), a dCAS9polypeptide (SEQ ID NO: 34), 1×HA tag (SEQ ID NO: 35), a nuclearlocalization signal (SEQ ID NO: 36), a linker (SEQ ID NO: 37), thecatalytic domain of human TET1 (TET1-CD) (SEQ ID NO: 38), and an OCSterminator sequence (SEQ ID NO: 39).

The amino acid sequence of dCas9_1×HA_2×NLS_TET1-CD fusion protein ispresented in SEQ ID NO: 40. The following amino acid sequences arepresent in this fusion protein: dCAS9 (SEQ ID NO: 41), 1×HA (SEQ ID NO:42), 2×NLS (SEQ ID NO: 43), linker (SEQ ID NO: 44), and TET1-CD (SEQ IDNO: 45).

To target the FWA locus, various gRNA sequences will be tested, aspresented in Table 2A. These gRNA sequences will be present in singlegRNA cassettes as well as in a series of tRNA-gRNA expression cassettes.CRISPR-targeting technology involving tRNA-gRNA expression cassettes isdescribed in Xie et al, PNAS (2015). This will allow for the delivery ofmultiple gRNAs simultaneously with high expression level.

TABLE 2A gRNA Molecules Targeting the FWA Promoter gRNA NamecrRNA Sequence (5’ → 3’) gRNA3 ATTCTCGACGGAAAGATGTA (SEQ ID NO: 171)gRNA4 ACGGAAAGATGTATGGGCTT (SEQ ID NO: 172) gRNA12TTCATACGAGCGCCGCTCTA (SEQ ID NO: 173) gRNA14CCATTGGTCCAAGTGCTATT (SEQ ID NO: 174) gRNA16GCGGCGCAAGATCTGATATT (SEQ ID NO: 175) gRNA17AAAACTAGGCCATCCATGGA (SEQ ID NO: 176)

One exemplary tRNA-gRNA expression cassette will contain two differentgRNA molecules: gRNA4 and gRNA17. This cassette will be calledU6p::tRNA-4-17, and the nucleotide sequence of this cassette ispresented in SEQ ID NO: 46. Other features of this cassette include a U6promoter (SEQ ID NO: 47), tRNA (SEQ ID NO: 48), gRNA backbone (SEQ IDNO: 49), and a PolIII terminator sequence (TTTTTTT).

Another exemplary tRNA-gRNA expression cassette will contain fourdifferent gRNA molecules: gRNA16, gRNA14, gRNA3, and gRNA17. Thiscassette will be called U6p::tRNA-16-14-3-17, and the nucleotidesequence of this cassette is presented in SEQ ID NO: 51. Other featuresof this cassette include a U6 promoter (SEQ ID NO: 47), tRNA (SEQ ID NO:48), gRNA backbone (SEQ ID NO: 49), and a PolIII terminator sequence(TTTTTTT).

Transformation of Col-0 Plants

The construct described above will be transformed into Col-0 wild-typeplants using Agrobacterium-mediated genetic transformation (after theconstruct is transformed into Agrobacterium). This process involvestransforming plants via floral dip using methods well-known in the art.

Flowering Time Measurements

Progeny of transformed plants (TIs) will be planted and screened forBASTA-resistant plants that incorporate the T-DNA into the Arabidopsisgenome, which confers resistance to BASTA. Among the BASTA-resistanttransgenic plants, flowering time will be measured and compared toearly-flowering wild-type Col-0 and late-flowering fwa-4 plants.Flowering time will be measured by counting the total number of leaves(rossette and cauline) of each individual plant.

Data Analysis

Plants transformed with the fusion constructs described above will beevaluated for phenotypic differences as compared to correspondingcontrol plants (e.g. wild-type plants and fwa-4 plants) which aresuggestive of successful fusion protein targeting to the locus ofinterest and subsequent de-methylation and/or transcriptional activationat the locus. The phenotype evaluated may vary depending on the locustargeted. Other analyses to be performed may include measuring theexpression level of the targeted locus in the transformed plants,measuring the degree of DNA methylation at the targeted locus in thetransformed plants (using e.g. bisulfite sequencing), or other assayswell-known to those of skill in the art.

It is thought that the fusion proteins containing a TET1-polypeptide asdescribed herein and a dCAS9 protein will be able to successfully targeta locus of interest and induce DNA de-methylation of the target locus.

Example 3: Modified CRISPR-Targeting of TET1 Polypeptide to SpecificLoci Using MS2 Coat Proteins

This Example describes exemplary experimental guidelines forconstructing recombinant constructs for use in a modifiedCRISPR-targeting scheme involving TET1 polypeptides as disclosed herein,dCAS9 proteins, and MS2 coat proteins. These constructs may be used totarget a TET1 polypeptide to a specific locus of a genome using theCRISPR-CAS9 system.

Example 2 describes the recombinant fusing of TET1 polypeptides to adCAS9 protein to target TET1 to a specific locus (e.g. FWA locus).However, it is possible that in some instances, the fusion between theTET1 polypeptide and the dCAS9 protein may impact the function of theTET1 polypeptide, the dCAS9 protein, or both the TET1 polypeptide andthe dCAS9 protein. Indeed, it is already known that recombinant fusionof heterologous proteins fused to CAS9 proteins can impact CAS9function. For example, Morita et al (Nature Biotechnology 34, 1060-1065(2016)) demonstrated that targeted demethylation using TET1 in animalcells is more efficient using the SunTag system, where TET1 is not fuseddirectly to dCas9, as compared to standard straight fusions of TET1 todCas9 through a small linker.

One way to circumvent the potential issues with CAS9 fusion proteins isto use other methods of CRISPR-targeting the TET1 polypeptide to thelocus of interest other than by fusing the TET1 polypeptide to the dCAS9protein. One such method involves adding a small RNA sequence that bindsto a specific protein which can then be fused to the TET1 polypeptide.Recently, work by Konermann et al. 2014 showed that two loops in thegRNA backbone (tetraloop and stem 2) can be modified without negativeeffects on gRNA-CAS9 activity. They added to these loops a hairpinaptamer that selectively binds dimerized MS2 bacteriophage coat proteinsand showed that MS2-mediated recruitment of the transcriptionalactivator VP64 to the gRNA-CAS9 complex was able to induce expression ofa target gene.

A similar technique will be used herein to bypass the possible negativeeffect that a TET1 polypeptide or the CAS9 protein may have on eachother's activity when expressed as a fusion protein in a plant cell. Afusion protein between MS2 and the catalytic domain of TET1 (TET1-CD)will be constructed. The diagram presented in FIG. 6 is a representativescheme of this three component system:(CAS9/gRNA-MS2-aptamer/MS2-TET1-CD).

A guide RNA designed to target the FWA locus will be fused to the MS2aptamer to guide the MS2-TET1-CD fusion protein to FWA via the dCAS9protein.

Other RNA-binding proteins may also be used in place of MS2, such as PP7and COM.

Construction of TET1-CD Fusion Proteins and gRNA-fwa

Cloning of m4UC_dCas9_MS2_TET1-CD_gRNAMS2. For this purpose, them4UC_UBQ10_dCas9 vector will be used. This vector contains 2 kb of the5′ promoter of Arabidopsis UBQ10 gene driving expression of a plantcodon-optimized dCas9 that is fused in its C-terminus to 1×HA tag and N7Nuclear Localization Signals (N7-NLS). A catalytically inactive Cas9,dCas9, will be generated by site directed mutagenesis to change D10A andH840 amino acids. Next, a modified pMDC123 vector (Curtis et al, PlantPhys, 2003) containing 700 bp of the 3′ OCS terminator will be used. 2kb of UBQ10 promoter, the MS2 binding protein sequence containing 3×GGGSflexible linker, one NLS (Konermann et al Nature. 2014), and 2×Flagsequence will be PCR amplified and cloned in this order by Infusion(Clontech) into the unique AscI site upstream of the gateway cassette ofthe modified pMDC123 to create pMDC123_MS2. The fragment of pMDC123_MS2containing the UBQ10 promoter_MS2_GatewayCassette_OCS terminator will bePCR amplified and inserted by InFusion (Clontech) into the unique PmeIsite of m4UC_UBQ10_dCas9 vector to create the m4UC_MS2 vector. A pENTRvector (Invitrogen) containing a cDNA of the TET1 catalytic domain(TET1-CD) will be used to deliver TET1-CD into m4UC_MS2 by LR reaction(Invitrogen) to create the m4UC_MS2_TET1_CD vector. Finally, theArabidopsis U6 promoter and a gRNA with MS2 loops at tetraloop andstemloop 2 (Konermann et al Nature. 2014) will be PCR amplified andcloned into the unique PmeI site of the m4UC_MS2_TET1_CD vector byInfusion (Clontech). Different 20 nt-long gRNA protospacers against theFWA promoter will be cloned into the gRNA_MS2 cassette by PCR. In orderto change the target sequence present in the different gRNAs, theprotocol described in Li et al., 2013 using the plasmid pUC-gRNA will befollowed.

The exemplary expression cassette of m4UC_dCas9_MS2_TET1-CD_gRNAMS2 willcontain a number of features. The nucleotide sequence of the expressioncassette is presented in SEQ ID NO: 54. This cassette is described as asingle cassette, but contains a number of different expression regions:(1) one that encodes a gRNA targeting the FWA promoter, (2) one thatencodes the dCAS9 coding region, and (3) one that encodes theMS2-TET1-CD fusion protein. The cassette includes a gRNA (SEQ ID NO:55), a U6 promoter (SEQ ID NO: 56), an OCS terminator (SEQ ID NO: 57),TET1-CD (SEQ ID NO: 58), 2×FLAG (SEQ ID NO: 59), NLS (SEQ ID NO: 60),3×GGGGS (SEQ ID NO: 61), MS2 (SEQ ID NO: 62), UBQ10 promoter (SEQ ID NO:63), Insulator (SEQ ID NO: 64), UBQ10 promoter (SEQ ID NO: 65), Omegaenhancer (SEQ ID NO: 66), dCAS9 (SEQ ID NO: 67), and an OCS terminator(SEQ ID NO: 68).

The amino acid sequence of the polypeptide fusion of dCAS9_HA_7N-NLS ispresented in SEQ ID NO: 69. The following amino acid sequences arepresent in this fusion protein: dCAS9 (SEQ ID NO: 70), 1×HA (SEQ ID NO:71), 7N-NLS (SEQ ID NO: 72).

The amino acid sequence of the polypeptide fusion ofMS2_3×GGGGS_NLS_2×Flag_TET1-CD is presented in SEQ ID NO: 73. Thefollowing amino acid sequences are present in this fusion protein: MS2(SEQ ID NO: 74), 3×GGGGS (SEQ ID NO: 75), NLS (SEQ ID NO: 76), 2×FLAG(SEQ ID NO: 77), TET1-CD (SEQ ID NO: 78).

To target the FWA locus, various gRNA sequences will be tested, aspresented in Table 3A.

Various gRNA sequences will also be present in a series of tRNA-gRNAexpression cassettes. CRISPR-targeting technology involving tRNA-gRNAexpression cassettes is described in Xie et al, PNAS (2015). This willallow for the delivery of multiple gRNAs simultaneously with highexpression level.

TABLE 3A gRNA Molecules Targeting the FWA Promoter gRNA NamecrRNA Sequence (5’ → 3’) gRNA3 ATTCTCGACGGAAAGATGTA (SEQ ID NO: 171)gRNA4 ACGGAAAGATGTATGGGCTT (SEQ ID NO: 172) gRNA12TTCATACGAGCGCCGCTCTA (SEQ ID NO: 173) gRNA14CCATTGGTCCAAGTGCTATT (SEQ ID NO: 174) gRNA16GCGGCGCAAGATCTGATATT (SEQ ID NO: 175) gRNA17AAAACTAGGCCATCCATGGA (SEQ ID NO: 176)

An appropriate crRNA sequence will be used in the gRNA structuredescribed above (See SEQ ID NO: 55). FIG. 7 illustrates how variouscrRNA sequences and the flanking PAM sequence map to the FWA locus.

For tRNA-gRNA cassettes, one exemplary tRNA-gRNA expression cassettewill contain two different gRNA molecules: gRNA4 and gRNA17. Thiscassette will be called U6p::tRNA-4-17, and the nucleotide sequence ofthis cassette is presented in SEQ ID NO: 46. Other features of thiscassette include a U6 promoter (SEQ ID NO: 47), tRNA (SEQ ID NO: 48),gRNA backbone (SEQ ID NO: 49), and a PolIII terminator sequence(TTTTTTT).

Another exemplary tRNA-gRNA expression cassette will contain fourdifferent gRNA molecules: gRNA16, gRNA14, gRNA3, and gRNA17. Thiscassette will be called U6p::tRNA-16-14-3-17, and the nucleotidesequence of this cassette is presented in SEQ ID NO: 51. Other featuresof this cassette include a U6 promoter (SEQ ID NO: 47), tRNA (SEQ ID NO:48), gRNA backbone (SEQ ID NO: 49), and a PolIII terminator sequence(TTTTTTT).

Transformation of Col-0 Plants

The construct described above will be transformed into Col-0 wild-typeplants using Agrobacterium-mediated genetic transformation (after theconstruct is transformed into Agrobacterium). This process involvestransforming plants via floral dip using methods well-known in the art.

Flowering Time Measurements

Progeny of transformed plants (TIs) will be planted and screened forBASTA-resistant plants that incorporate the T-DNA into the Arabidopsisgenome, which confers resistance to BASTA. Among the BASTA-resistanttransgenic plants, flowering time will be measured and compared toearly-flowering wild-type Col-0 and late-flowering fwa-4 plants.Flowering time will be measured by counting the total number of leaves(rossette and cauline) of each individual plant.

Data Analysis

Plants transformed with the fusion constructs described above will beevaluated for phenotypic differences as compared to correspondingcontrol plants (e.g. wild-type plants and fwa-4 plants) which aresuggestive of successful targeting of the TET1 polypeptide to the locusof interest and subsequent de-methylation and/or transcriptionalactivation at the locus. The phenotype evaluated may vary depending onthe locus targeted. Other analyses to be performed may include measuringthe expression level of the targeted locus in the transformed plants,measuring the degree of DNA methylation at the targeted locus in thetransformed plants (using e.g. bisulfite sequencing), or other assayswell-known to those of skill in the art.

It is thought that the targeting scheme described in this Example willbe able to successfully target a locus of interest and induce DNAde-methylation of the target locus.

Example 4: Modified CRISPR-Targeting of TET1 Polypeptide to SpecificLoci Using SunTag Constructs

This Example describes exemplary experimental guidelines forconstructing recombinant constructs for use in a modifiedCRISPR-targeting scheme involving TET1 polypeptides as disclosed herein,dCAS9 proteins, and SunTag constructs. These constructs may be used totarget a TET1 polypeptide to a specific locus of a genome using theCRISPR-CAS9 system.

Example 2 describes the recombinant fusing of TET1 polypeptides to adCAS9 protein to target TET1 to a specific locus (e.g. FWA locus).However, it is possible that in some instances, the fusion between theTET1 polypeptide and the dCAS9 protein may impact the function of theTET1 polypeptide, the dCAS9 protein, or both the TET1 polypeptide andthe dCAS9 protein. Indeed, it is already known that recombinant fusionof heterologous proteins fused to CAS9 proteins can impact CAS9function. For example, Morita et al (Nature Biotechnology 34, 1060-1065(2016)) demonstrated that targeted demethylation using TET1 in animalcells is more efficient using the SunTag system, where TET1 is not fuseddirectly to dCas9, as compared to standard straight fusions of TET1 todCas9 through a small linker.

A technique called SunTag was developed to recruit many effectorproteins simultaneously to a location via one dCAS9 protein. In thisway, there is an amplification of the effect of targeting, and improvedmagnitude of gene regulation (Tanenbaum et al, 2014). Tanenbaum et al.described that a dCas9 protein was fused to an unstructured peptide thatcontains up to 24 copies of the GCN4 epitope. A single chain antibody,scFV, designed to bind this peptide sequence with high affinity andspecificity, was fused to an effector protein for gene regulation.Co-expression of the two components allows binding of up to 24 copies ofthe antibody-fused effector protein to each CAS9-GCN4 fusion protein. Inthe case of VP64 as an effector protein, this procedure resulted in veryhigh activation of gene expression compared to simple CAS9-VP64 fusionproteins.

Recently, Morita et al (Nature Biotechnology 34, 1060-1065 (2016))described a SunTag system that is capable of triggering targeteddemethylation when using the TET1 catalytic domain (TET1-CD) inmammalian cells and systems. In this system, dCas9 is fused to anunstructured peptide that contains 5 copies of the GCN4 epitope. Asingle chain antibody, scFv, designed to bind this peptide sequence withhigh affinity and specificity, is fused to TET1-CD. Co-expression of thetwo components allowed binding of up to 5 copies of the antibody-fusedeffector protein to each Cas9-GCN4 protein. In case of TET1-CD as aneffector protein, this procedure resulted in very high demethylationcompared to straight fusions of TET1-CD to dCAS9.

A similar technique will be used herein to allow multiple copies of aTET1 polypeptide to bind a dCAS9-GCN4 fusion protein. The diagrampresented in FIG. 8 illustrates an exemplary scheme of this targetingsystem. A guide RNA designed to target the FWA locus will beco-expressed with the U6 promoter as in the schemes.

Construction of TET1-CD Fusion Proteins and gRNA-fwa

Construction ofUBQ10_dCAS9_1×HA_2×NLS_5×GCN4_UBQ10_scFV_sfGFP_1×HA_2×NLS_TET1CD. Forthis purpose, a modified pMTN3164 plasmid and a modified pC1300 plasmidwill be created first. dCAS9_1×HA_2×NLS_10×GCN4 will be created throughgene synthesis and will be cloned downstream of a fragment containing1986 bp of the promoter region of Arabidopsis UBQ10 gene followed by anomega RBC translational enhancer creating pMTN3164UBQ10_dCAS9_1×HA_2×NLS_5×GCN4 and pC1300 UBQ10_dCAS9_1×HA_2×NLS_5×GCN4.A second fragment containing 1986 bp of the promoter region ofArabidopsis UBQ10 gene will be cloned downstream of the 10×GCN4 in thepMTN3164 UBQ10_dCAS9_1×HA_2×NLS_5×GCN4 or pC1300UBQ10_dCAS9_1×HA_2×NLS_5×GCN4 vectors followed by scFV, sfGFP, 1×HA tag,2×NLS, and TET1-CD sequence that will be created through gene synthesiscreating pMTN3164UBQ10_dCAS9_1×HA_2×NLS_5×GCN4_UBQ10_scFV_sfGFP_1×HA_2×NLS_TET1CD andpC1300 UBQ10_dCAS9_1×HA_2×NLS_5×GCN4_UBQ10_scFV_sfGFP_1×HA_2×NLS_TET1CD.A gRNA cassette driven by a U6 promoter expressing a single gRNA will beinserted at the PmeI restriction site of pMTN3164UBQ10_dCAS9_1×HA_2×NLS_5×GCN4_UBQ10_scFV_sfGFP_1×HA_2×NLS_TET1CD orpC1300 UBQ10_dCAS9_1×HA_2×NLS_5×GCN4_UBQ10_scFV_sfGFP_1×HA_2×NLS_TET1CD.

The exemplary expression cassette ofUBQ10_dCAS9_1×HA_2×NLS_5×GCN4_UBQ10_scFV_sfGFP_1×HA_2×NLS_TET1-CD willcontain a number of features. The nucleotide sequence of the expressioncassette is presented in SEQ ID NO: 79. This cassette is described as asingle cassette, but contains a number of different expression regions:(1) one that encodes a gRNA targeting the FWA promoter, (2) one thatencodes the dCAS9-5×GCN4 fusion protein, and (3) one that encodes thescFv-TET1-CD fusion protein. The cassette includes U6::gRNA (SEQ ID NO:80), a UBQ10 promoter (SEQ ID NO: 81), Omega RBC (SEQ ID NO: 82), dCAS9(SEQ ID NO: 83), 1×HA (SEQ ID NO: 84), 2×NLS (SEQ ID NO: 85), linker(SEQ ID NO: 86), 5×GCN4 (SEQ ID NO: 87), OCS terminator (SEQ ID NO: 88),insulator (SEQ ID NO: 89), scFv (SEQ ID NO: 90), sfGFP (SEQ ID NO: 91),TET1-CD (SEQ ID NO: 92), and NOS terminator (SEQ ID NO: 93).

The amino acid sequence of the polypeptide fusion ofdCAS9_1×HA_2×NLS_5×GCN4 is presented in SEQ ID NO: 94. Relevant aminoacid sequences present in this fusion protein include, for example:dCAS9 (SEQ ID NO: 95), 1×HA (SEQ ID NO: 96), 2×NLS (SEQ ID NO: 97),linker (SEQ ID NO: 98), and 5×GCN4 (SEQ ID NO: 99).

The amino acid sequence of the polypeptide fusion ofscFV_sfGFP_1×HA_2×NLS_TET1CD is presented in SEQ ID NO: 100. Relevantamino acid sequences present in this fusion protein include, forexample: scFv (SEQ ID NO: 101), sfGFP (SEQ ID NO: 102), and TET1-CD (SEQID NO: 103).

A similar construct to the one above will also be constructed, but willcontain 10×GCN4 (SEQ ID NO: 104), instead of 5×GCN4.

To target the FWA locus, various gRNA sequences will be tested, aspresented in Table 4A.

Various gRNA sequences will also be present in a series of tRNA-gRNAexpression cassettes. CRISPR-targeting technology involving tRNA-gRNAexpression cassettes is described in Xie et al, PNAS (2015). This willallow for the delivery of multiple gRNAs simultaneously with highexpression level.

TABLE 4A gRNA Molecules Targeting the FWA Promoter gRNA NamecrRNA Sequence (5’ → 3’) gRNA3 ATTCTCGACGGAAAGATGTA (SEQ ID NO: 171)gRNA4 ACGGAAAGATGTATGGGCTT (SEQ ID NO: 172) gRNA12TTCATACGAGCGCCGCTCTA (SEQ ID NO: 173) gRNA14CCATTGGTCCAAGTGCTATT (SEQ ID NO: 174) gRNA16GCGGCGCAAGATCTGATATT (SEQ ID NO: 175) gRNA17AAAACTAGGCCATCCATGGA (SEQ ID NO: 176)

An appropriate crRNA sequence will be used in the gRNA structuredescribed above (See SEQ ID NO: 80). FIG. 7 illustrates how variouscrRNA sequences and the flanking PAM sequence map to the FWA locus.

For tRNA-gRNA cassettes, one exemplary tRNA-gRNA expression cassettewill contain two different gRNA molecules: gRNA4 and gRNA17. Thiscassette will be called U6p::tRNA-4-17, and the nucleotide sequence ofthis cassette is presented in SEQ ID NO: 46. Other features of thiscassette include a U6 promoter (SEQ ID NO: 47), tRNA (SEQ ID NO: 48),gRNA backbone (SEQ ID NO: 49), and a PolIII terminator sequence(TTTTTTT).

Another exemplary tRNA-gRNA expression cassette will contain fourdifferent gRNA molecules: gRNA16, gRNA14, gRNA3, and gRNA17. Thiscassette will be called U6p::tRNA-16-14-3-17, and the nucleotidesequence of this cassette is presented in SEQ ID NO: 51. Other featuresof this cassette include a U6 promoter (SEQ ID NO: 47), tRNA (SEQ ID NO:48), gRNA backbone (SEQ ID NO: 49), and a PolIII terminator sequence(TTTTTTT).

Transformation of Col-0 Plants

The construct described above will be transformed into Col-0 wild-typeplants using Agrobacterium-mediated genetic transformation (after theconstruct is transformed into Agrobacterium). This process involvestransforming plants via floral dip using methods well-known in the art.

Flowering Time Measurements

Progeny of transformed plants (TIs) will be planted and screened forBASTA-resistant plants that incorporate the T-DNA into the Arabidopsisgenome, which confers resistance to BASTA. Among the BASTA-resistanttransgenic plants, flowering time will be measured and compared toearly-flowering wild-type Col-0 and late-flowering fwa-4 plants.Flowering time will be measured by counting the total number of leaves(rossette and cauline) of each individual plant.

Data Analysis

Plants transformed with the fusion constructs described above will beevaluated for phenotypic differences as compared to correspondingcontrol plants (e.g. wild-type plants and fwa-4 plants) which aresuggestive of successful targeting of the TET1 polypeptide to the locusof interest and subsequent de-methylation and/or transcriptionalactivation at the locus. The phenotype evaluated may vary depending onthe locus targeted. Other analyses to be performed may include measuringthe expression level of the targeted locus in the transformed plants,measuring the degree of DNA methylation at the targeted locus in thetransformed plants (using e.g. bisulfite sequencing), or other assayswell-known to those of skill in the art.

It is thought that the targeting scheme described in this Example willbe able to successfully target a locus of interest and induce DNAde-methylation of the target locus.

Example 5: SunTag-Based Targeting of TET1 to FWA Locus

In the present Example, Applicant used the SunTag targeting scheme totarget a TET1 polypeptide to the FWA locus in Arabidopsis using theCRISPR-CAS9 system.

Example 4 describes an exemplary SunTag-based targeting scheme to targeta TET1 polypeptide to a target nucleic acid. This Example describes asuccessful SunTag targeting scheme in which a TET1 polypeptide wastargeted to the FWA locus in Arabidopsis using the CRISPR-CAS9 system. Aschematic of the targeting system is presented in FIG. 9 .

Materials and Methods

Construction of:

gRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSandgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS

For this purpose, a dCAS9_1×HA_3×NLS_10×GCN4 that contains a 22aa spacerbetween epitopes (dCAS9_1×HA_3×NLS_10×GCN422aa) and adCAS9_1×HA_3×NLS_10×GCN4 that contains a 14aa spacer between epitopes(dCAS9_1×HA_3×NLS_10×GCN414aa) were created through a combination ofgene synthesis and the utilization of plasmids from Addgene, andseparately cloned into a modified pMTN3164 plasmid downstream of afragment containing 1986 bp of the promoter region of Arabidopsis UBQ10gene followed by an omega RBC translational enhancer and upstream of aOCS terminator creating pMTN3164 UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSand pMTN3164 UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS. An insulatorsequence followed by a second fragment containing 1986 bp of thepromoter region of Arabidopsis UBQ10 gene was then cloned upstream ofUBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS and pMTN3164UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS such that transcription ofdownstream targets resulting from this second UBQ promoter would occuropposite the dCAS9_1×HA_3×NLS_10×GCN422aa ordCAS9_1×HA_3×NLS_10×GCN414aa transcription. A NOS terminator was thencloned downstream of this second UBQ10 promoter in both theUBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS andUBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS constructscreating pMTN3164NOS_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS and pMTN3164NOS_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS. AscFv_sfGFP_1×HA_2×NLS_TET1CD sequence created through a combination ofgene synthesis and the utilization of plasmids from Addgene was thencloned downstream of the second UBQ10 promoter and upstream of the NOSterminator in both vectors creating pMTN3164NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSand pMTN3164NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS.For both vectors a gRNA4 cassette driven by a U6 promoter expressing asingle gRNA4 was inserted at the PmeI restriction site of pMTN3164downstream of the NOS terminator creatinggRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSandgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS.

The expression cassette ofgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSandgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSdiffer only in the 10×GCN4 sequence. These vectors contain a number offeatures. The nucleotide sequencegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSandgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSexpression cassettes are presented in SEQ ID NO: 105 and SEQ ID NO: 106,respectively. These cassettes are described as single cassettes, butcontain many different expression regions: (1) one that encodes gRNA4(See Example 4) targeting the FWA promoter, (2) one that encodes thedCAS9-10×GCN4 fusion protein, and (3) one that encodes thescFv-sfGFP-TET1-CD fusion protein. The cassette includes U6::gRNA4 (SEQID NO: 107), a UBQ10 promoter (SEQ ID NO: 108), Omega RBC (SEQ ID NO:109), dCAS9 (SEQ ID NO: 110), 1×HA (SEQ ID NO: 111), 3×NLS (SEQ ID NO:112), 2×NLS (SEQ ID NO: 113), linker (SEQ ID NO: 114), 10×GCN422aa (SEQID NO: 115) or 10×GCN414aa (SEQ ID NO: 116), OCS terminator (SEQ ID NO:117), insulator (SEQ ID NO: 118), scFv (SEQ ID NO: 119), sfGFP (SEQ IDNO: 120), TET1-CD (SEQ ID NO: 121), and NOS terminator (SEQ ID NO: 122).

The amino acid sequence of the polypeptide fusion ofdCAS9_1×HA_3×NLS_10×GCN422aa is presented in SEQ ID NO: 123 and aminoacid sequence of the polypeptide fusion of dCAS9_1×HA_3×NLS_10×GCN414aais presented in SEQ ID NO: 124. Relevant amino acid sequences present inthese fusion proteins include, for example: dCAS9 (SEQ ID NO: 125), 1×HA(SEQ ID NO: 126), 3×NLS (SEQ ID NO: 127), linker (SEQ ID NO: 128), and10×GCN422aa (SEQ ID NO: 129) or 10×GCN414aa (SEQ ID NO: 130).

The amino acid sequence of the polypeptide fusion ofscFv_sfGFP_1×HA_2×NLS_TET1CD is presented in SEQ ID NO: 131 and isidentical in bothgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSandgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSvectors. Relevant amino acid sequences present in this fusion proteininclude, for example: scFv (SEQ ID NO: 132), sfGFP (SEQ ID NO: 133),1×HA (SEQ ID NO: 134), 2×NLS (SEQ ID NO: 135), Linker (SEQ ID NO: 136),and TET1-CD (SEQ ID NO: 137).

Plant Transformation and Flowering Time Measurement

The constructs described above were transformed into Col-0 wild-typeplants using Agrobacterium-mediated genetic transformation (after theconstruct was transformed into Agrobacterium). This process involvestransforming plants via floral dip using methods well known in the art.Progeny of transformed plants (TIs) were screened for Hygromycinresistance. Among the Hygromycin-resistant transgenic plants, floweringtime was measured and compared to early-flowering wild-type Col-0 andlate-flowering fwa-4 plants. Flowering time was measured by counting thetotal number of leaves (rosette and cauline) of each individual plant.

Bisulfite Sequencing and Data Analysis

Whole genome bisulfite sequencing (BS-Seq) libraries were generated aspreviously reported (Cokus et al., 2008) and all libraries weresequenced using the HiSeq 2000 platform following manufacturerinstructions (Illumina) at a length of 50 bp. BS-Seq reads were alignedto the TAIR10 version of the Arabidopsis thaliana reference genome usingBS-map-2.74. For BS-Seq, up to 2 mismatches were allowed and onlyuniquely mapped reads were used.

RNA-Seq

Raw reads in qseq format obtained from the sequencer were firstconverted to fastq format with a customized perl script. Read qualitywas controlled with FastQC (worldwide web:bioinformatics.babraham.ac.uk/projects/fastqc). High quality reads werethen aligned to Tair10 reference genome using Tophat (Trapnell et al,2009) (v 2.0.13) by using ‘-no-coverage-search’ option, allowing up totwo mismatches and only keeping reads that mapped to one location.Essentially, reads were first mapped to TAIR10 gene annotation withknown splice junction. When reads did not map to the annotated genes,the reads were mapped to Tair10 genome. The number of reads mapping togenes were calculated by HTseq (Anders et al., 2015) (v 0.5.4) withdefault parameters. Expression levels were determined by RPKM (reads perkilobase of exons per million aligned reads) in R using customizedscripts.

Results

To explore whethergRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCScan trigger demethylation and reactivate FWA expression, wild-type Col-0plants were transformed with thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSconstruct described above. Flowering time of T1 transgenic plants wasassayed, and results are presented below in Table 5A.

TABLE 5A Flowering Time Results Early Late Line Flowering FloweringgRNA4_U6_TET1CD_2xNLS_1xHA_ 7 2 sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1xHA_3xNLS_10xGCN422aa gRNA4_U6_NOS_TET1CD_2xNLS_1xHA_ 1 2sfGFP_scFv_UBQ10_INSULATOR_ UBQ10_dCAS9_1xHA_3xNLS_ 10xGCN414aa_OCS

The results presented in Table 5A demonstrate that targeting the TET1catalytic domain (TET1-CD) to the FWA locus using the SunTag system canefficiently promote late flowering in wild-type plants.

To test if the late flowering phenotype of plants containing thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSconstruct describe in Table 5A is due to the loss of methylation in theFWA promoter, whole-genome BS-Seq experiments were conducted asdescribed above. The results, presented in FIG. 10 and FIG. 11 forplants containing thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSconstruct and FIG. 12 and FIG. 13 for plants containing thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSconstruct, show a loss of methylation in the FWA promoter in backgroundsthat contain thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSconstructs and that this demethylation was specific to the FWA promoter(FIG. 11 and FIG. 13 ).

To test if the late flowering observed ingRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSlines were due to the activation of FWA expression, RNA-seq wasperformed with one independent T1 line forgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSand two independent T1 lines forgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS.The results presented in FIG. 14 show that FWA was upregulated in alltransgenic lines tested compared to expression in Col-0 wild type plantswhile two control housekeeping genes remained unaffected.

The results presented in this Example demonstrate that the specifictargeting of the TET1 catalytic domain to a genomic region of interestcan be used to target demethylation and gene activation in plants in avery specific manner.

Example 6: SunTag-Based Targeting of TET1 to the CACTA1 Locus

In the present Example, Applicant used the SunTag targeting scheme totarget a TET1 polypeptide to the CACTA1 locus in Arabidopsis using theCRISPR-CAS9 SunTag system.

Example 4 describes an exemplary SunTag-based targeting scheme to targeta TET1 catalytic polypeptide to a target nucleic acid. This Exampledescribes a successful SunTag targeting scheme in which a TET1polypeptide was targeted to the CACTA1 locus in Arabidopsis using theCRISPR-CAS9 system. A schematic of the targeting system is presented inFIG. 15 .

Materials and Methods

Construction of:

-   -   CACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS

For this purpose, a dCAS9_1×HA_3×NLS_10×GCN4 that contains a 22aa spacerbetween epitopes (dCAS9_1×HA_3×NLS_10×GCN422aa) was created through acombination of gene synthesis and the utilization of plasmids fromAddgene and separately cloned into a modified pMTN3164 (also calledpMOA) plasmid downstream of a fragment containing 1994 bp of thepromoter region of the Arabidopsis UBQ10 gene followed by an omega RBCtranslational enhancer and upstream of an OCS terminator creatingpMTN3164 UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS. An insulator sequencefollowed by a second fragment containing 1986 bp of the promoter regionof the Arabidopsis UBQ10 gene was then cloned upstream ofUBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS such that transcription ofdownstream targets resulting from this second UBQ10 promoter would occuropposite the dCAS9_1×HA_3×NLS_10×GCN422aa transcription. Sequencescreated through a combination of gene synthesis and the utilization ofplasmids from Addgene were then cloned downstream of the second UBQ10promoter creating pMTN3164TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS.A NOS terminator was then cloned downstream of TET1cd in theTET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSconstruct creating pMTN3164NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS.A CACTA1gRNA2 cassette driven by a U6 promoter expressing a singleCACTA1gRNA2 was inserted at the PmeI restriction site of pMTN3164downstream of the NOS terminator creatingCACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA_sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS.

The expression cassette ofCACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA_sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCScontains a number of features. The nucleotide sequence of theCACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA_sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSexpression cassette is presented in SEQ ID NO: 142. This cassette isdescribed as a single cassette, but contains many different expressionregions: (1) one that encodes CACTA1gRNA2 targeting the CACTA1 promoter,(2) one that encodes the dCAS9-10×GCN4 fusion protein, and (3) one thatencodes the scFv-sfGFP-TET1-CD fusion protein. The cassette includesU6::CACTA1gRNA2 (SEQ ID NO: 143), a UBQ10 promoter (SEQ ID NO: 108),Omega RBC (SEQ ID NO: 109), dCAS9 (SEQ ID NO: 110), 1×HA (SEQ ID NO:111), 3×NLS (SEQ ID NO: 112), 2×NLS (SEQ ID NO: 113), linkers (SEQ IDNO: 114), 10×GCN422aa (SEQ ID NO: 115), OCS terminator (SEQ ID NO: 117),insulator (SEQ ID NO: 118), scFv (SEQ ID NO: 119), sfGFP (SEQ ID NO:120), TET1-CD (SEQ ID NO: 121), and NOS terminator (SEQ ID NO: 122).

The amino acid sequence of the polypeptide fusion ofdCAS9_1×HA_3×NLS_10×GCN422aa is presented in SEQ ID NO: 123. Relevantamino acid sequences present in these fusion proteins include, forexample: dCAS9 (SEQ ID NO: 125), 1×HA (SEQ ID NO: 126), 3×NLS (SEQ IDNO: 127), linker (SEQ ID NO: 128), and 10×GCN422aa (SEQ ID NO: 129).

The amino acid sequence of the polypeptide fusion ofscFv_sfGFP_1×HA_2×NLS_TET1CD is presented in SEQ ID NO: 131. Relevantamino acid sequences present in this fusion protein include, forexample: scFv (SEQ ID NO: 132), sfGFP (SEQ ID NO: 133), 1×HA (SEQ ID NO:134), 2×NLS (SEQ ID NO: 135), Linkers (SEQ ID NO: 136), and TET1-CD (SEQID NO: 137).

Plant Transformation

The constructs described above were transformed into Col-0 wild-typeplants using Agrobacterium-mediated genetic transformation (after theconstruct was transformed into Agrobacterium). This process involvestransforming plants via floral dip using methods well known in the art.Progeny of transformed plants (TIs) were screened for Hygromycinresistance.

Bisulfite Sequencing and Data Analysis

Whole genome bisulfite sequencing (BS-Seq) libraries were generated aspreviously reported (Cokus et al., 2008) and all libraries weresequenced using the HiSeq 4000 platform following manufacturerinstructions (Illumina) at a length of 50 bp. BS-Seq reads were alignedto the TAIR10 version of the Arabidopsis thaliana reference genome usingBS-map-2.74. For BS-Seq, up to 2 mismatches were allowed and onlyuniquely mapped reads were used.

Metaplot of WGBS data

Metaplots of WGBS data were made using custom Perl and R scripts.Regions of interest were broken into 50 bins while flanking 1 kb regionswere each broken into 25 bins. CG, CHG and CHH methylation levels ineach bin were then determined. Metaplots were then generated with R.

Quantitative Real-Time PCR

Among the Hygromycin-resistant transgenic plants, CACTA1 gene expressionwas measured and compared to CACTA1 gene expression in wild-type Col-0.Gene expression was measured by performing quantitative Real-time PCR(qPCR) of each individual plant. qPCR was done using the oligos(5′-agtgtttcaatcaaggcgtttc-3′) (SEQ ID NO: 177) and(5′-cacccaatggaacaaagtgaac-3′) (SEQ ID NO: 178) to amplify a region ofthe CACTA1 gene. As an internal control, CACTA1 expression values werenormalized to the expression of the IPP2 housekeeping gene collectedfrom the same sample using oligos (5′-gtatgagttgcttctccagcaaag-3′) (SEQID NO: 179) and (5′-gaggatggctgcaacaagtgt-3′) (SEQ ID NO: 180).

Results

To explore ifCACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCScan trigger demethylation and reactivate CACTA1 expression, wild-typeCol-0 plants were transformed with theCACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA_sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSconstruct described above. CACTA1 expression was assayed using qPCR. Theresults presented in FIG. 16 demonstrate that targeting the TET1catalytic domain (TET1-CD) to the CACTA1 locus using the SunTag systemcan efficiently reactivate CACTA1 expression.

To test if reactivation of CACTA1 expression in plants containing theCACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA_sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCStransgene described in FIG. 16 is due to the loss of methylation in theCACTA1 promoter, BS-Seq experiments were conducted as described above.The results, presented in FIG. 17 and FIG. 18 , show a loss ofmethylation in the CACTA1 promoter in backgrounds that contains theCACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA_sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCStransgene, and show that this demethylation was specific to the CACTA1promoter (FIG. 17 and FIG. 18 ), as regions flanking CACTA1 were mostlyunaffected.

To test the specificity of the targeted demethylation caused by theexpression of theCACTA1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCStransgene in T1 plants, genome-wide methylation levels were checked andcompared with that of a Col-0 control plant. The results presented inFIG. 19 show that genome-wide DNA methylation levels were similar amongall backgrounds examined, indicating that the TET1 fusion wasspecifically acting at its target.

The results presented in this Example demonstrate that the specifictargeting of the TET1 catalytic domain to a genomic region of interestby the SunTag targeting scheme can be used to target demethylation andgene activation in plants in a very specific manner. This system canthus be used to study the role of DNA methylation at specific lociwithout the need for mutants or chemicals that impair genome-widemethylation levels. The successful demethylation of the promoter regionof CACTA1 indicates that other TEs may also be amenable to targeteddemethylation, which enables the exploration of the effects of TEactivity upon genome integrity, as well as the reactivation of TEs formutagenesis.

Example 7: SunTag-Based Targeting of TET1 to the ROS1 Locus

In the present Example, Applicant used the SunTag targeting scheme totarget a TET1 polypeptide to the ROS1 locus in Arabidopsis using theCRISPR-CAS9 system.

Example 4 describes an exemplary SunTag-based targeting scheme to targeta TET1 polypeptide to a target nucleic acid. This Example describes asuccessful SunTag targeting scheme in which a TET1 polypeptide wastargeted to the ROS1 locus in Arabidopsis using the CRISPR-CAS9 system.A schematic of the targeting system is presented in FIG. 20 .

Materials and Methods

Construction of:

-   -   ROS1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ        10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS

For this purpose, a dCAS9_1×HA_3×NLS_10×GCN4 that contains a 22aa spacerbetween epitopes (dCAS9_1×HA_3×NLS_10×GCN422aa) was created through acombination of gene synthesis and the utilization of plasmids fromAddgene and separately cloned into a modified pMTN3164 plasmiddownstream of a fragment containing 1994 bp of the promoter region ofthe Arabidopsis UBQ10 gene followed by an omega RBC translationalenhancer and upstream of an OCS terminator creating pMTN3164UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS. An insulator sequence followedby a second fragment containing 1994 bp of the promoter region of theArabidopsis UBQ10 gene was then cloned upstream ofUBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS such that transcription ofdownstream targets resulting from this second UBQ10 promoter would occuropposite the dCAS9_1×HA_3×NLS_10×GCN422aa transcription. AscFv_sfGFP_1×HA_2×NLS_TET1CD sequence created through a combination ofgene synthesis and the utilization of plasmids from Addgene was thencloned downstream of the second UBQ10 promoter creating pMTN3164TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS.A NOS terminator was then cloned downstream of TET1cd in theTET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSconstruct creating pMTN3164NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS.A ROS1gRNA2 cassette driven by a U6 promoter expressing a singleROS1gRNA2 was inserted at the PmeI restriction site of pMTN3164downstream of the NOS terminator creatingROS1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS.

TheROS1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSvector contains a number of features. The nucleotide sequence of of theROS1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSexpression cassette is presented in SEQ ID NO: 144. This cassette isdescribed as a single cassette, but contains many different expressionregions: (1) one that encodes ROS1gRNA2 targeting the ROS1 promoter, (2)one that encodes the dCAS9-10×GCN4 fusion protein, and (3) one thatencodes the scFv-sfGFP-TET1-CD fusion protein. The cassette includesU6::ROS1gRNA2 (SEQ ID NO: 145), a UBQ10 promoter (SEQ ID NO: 108), OmegaRBC (SEQ ID NO: 109), dCAS9 (SEQ ID NO: 110), 1×HA (SEQ ID NO: 111),3×NLS (SEQ ID NO: 112), 2×NLS (SEQ ID NO: 113), linkers (SEQ ID NO:114), 10×GCN422aa (SEQ ID NO: 115), OCS terminator (SEQ ID NO: 117),insulator (SEQ ID NO: 118), scFv (SEQ ID NO: 119), sfGFP (SEQ ID NO:120), TET1-CD (SEQ ID NO: 121), and NOS terminator (SEQ ID NO: 122).

The amino acid sequence of the polypeptide fusion ofdCAS9_1×HA_3×NLS_10×GCN422aa is presented in SEQ ID NO: 123. Relevantamino acid sequences present in these fusion proteins include, forexample: dCAS9 (SEQ ID NO: 125), 1×HA (SEQ ID NO: 126), 3×NLS (SEQ IDNO: 127), linker (SEQ ID NO: 128), and 10×GCN422aa (SEQ ID NO: 129).

The amino acid sequence of the polypeptide fusion ofscFv_sfGFP_1×HA_2×NLS_TET1CD is presented in SEQ ID NO: 131. Relevantamino acid sequences present in this fusion protein include, forexample: scFv (SEQ ID NO: 132), sfGFP (SEQ ID NO: 133), 1×HA (SEQ ID NO:134), 2×NLS (SEQ ID NO: 135), Linkers (SEQ ID NO: 136), and TET1-CD (SEQID NO: 137).

Plant Transformation

The construct described above was transformed into Col-0 wild-typeplants using Agrobacterium-mediated genetic transformation (after theconstruct was transformed into Agrobacterium). This process involvestransforming plants via floral dip using methods well known in the art.Progeny of transformed plants (TIs) were screened for Hygromycinresistance.

Quantitative Real-Time PCR

Among the Hygromycin-resistant transgenic plants, ROS1 gene expressionwas measured and compared to ROS1 gene expression in wild-type Col-0.Gene expression was measured by performing Quantitative Real-time PCR(qPCR) of each individual plant. qPCR was done using the oligos(5′-caggcttgcttttggaaagggtacg-3′) (SEQ ID NO: 181) and(5′-gtgctctctcactcttaaccataagct-3′) (SEQ ID NO: 182) to amplify a regionof the ROS1 gene. As an internal control ROS1 expression values werenormalized to the expression of the IPP2 housekeeping gene collectedfrom the same sample using oligos (5′-gtatgagttgcttctccagcaaag-3′) (SEQID NO: 183) and (5′-gaggatggctgcaacaagtgt-3′) (SEQ ID NO: 184).

Bisulfite Sequencing and Data Analysis

Whole genome bisulfite sequencing (BS-Seq) libraries were generated aspreviously reported (Cokus et al., 2008) and all libraries weresequenced using the HiSeq 4000 platform following manufacturerinstructions (Illumina) at a length of 50 bp. BS-Seq reads were alignedto the TAIR10 version of the Arabidopsis thaliana reference genome usingBS-map-2.74. For BS-Seq, up to 2 mismatches were allowed and onlyuniquely mapped reads were used.

Results

ROS1 is an example of a gene whose expression depends on DNAmethylation. Methylation mutants with lower DNA methylation in the ROS1promoter show reduced ROS1 expression (Lei M, et al. (2015) Regulatorylink between DNA methylation and active demethylation in Arabidopsis.Proc Nat Acad Sci 112(11):3553-3557; Williams B P, Pignatta D, HenikoffS, Gehring M (2015) Methylation-Sensitive Expression of a DNADemethylase Gene Serves As an Epigenetic Rheostat. PLoS Genet11(3):1-18). To explore ifROS1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCScan trigger demethylation and repress ROS1 expression, wild-type Col-0plants were transformed with theROS1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSconstruct described above. ROS1 expression was assayed using qPCR, andthe results are presented in FIG. 21 . The results presented in FIG. 21demonstrate that targeting the TET1 catalytic domain (TET1-CD) to theROS1 locus using the SunTag targeting scheme can efficiently repressROS1 expression.

To test if repression of ROS1 expression in plants containing theROS1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSconstruct described in FIG. 20 is due to the loss of methylation in theROS1 promoter, whole-genome BS-Seq experiments were conducted asdescribed above. The results, presented in FIG. 22 and FIG. 23 withdifferently scaled genome browser views, show a loss of methylation inthe ROS1 promoter in backgrounds that contain theROS1gRNA2_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSconstruct.

The results presented in this Example demonstrate that the specifictargeting of the TET1 catalytic domain to a genomic region of interestcan be used to target demethylation and gene repression in plants in avery specific manner. This Example shows that the SunTag system can beused to modify the methylation state of regulatory regions in theArabidopsis genome. It provides the opportunity to explore theregulatory networks controlling the gene expression of specific loci.

Example 8: DNA-Binding Domain-Targeting of Demethylation Factor TET1(Catalytic Domain) to the CACTA1 Locus in Arabidopsis

This Example demonstrates the targeting of the catalytic domain of aTET1 protein to the CACTA1 locus in Arabidopsis using synthetic ZincFinger polypeptides.

Example 1 describes the successful TET1 polypeptide targeting schemeusing a synthetic zinc finger designed to target the FWA locus. ThisExample describes a similar successful targeting scheme using asynthetic zinc finger designed to target the CACTA1 locus.

Materials and Methods

-   -   Cloning of pUBQ10::ZF1CACTA1_3×Flag_TET1CD and        pUBQ10::ZF2CACTA1_3×Flag_TET1CD

For this purpose, a modified pMDC123 plasmid (Curtis et al, 2003, PlantPhys) was created first, containing 1990 bp of the promoter region ofthe Arabidopsis UBQ10 gene upstream of a cassette containing a HpaIrestriction site and a 3×Flag tag creating a pMDC123 pUBQ10_3×Flagvector. Both UBQ10 promoter and 3×Flag are upstream of the gatewaycassette (Invitrogen) present in the original pMDC123 plasmid. TheTET1cd was delivered into the modified pMDC123 by an LR reaction(Invitrogen), creating an in-frame fusion of the TET1cd cDNA with theupstream 3×Flag cassette resulting in a pMDC123 pUBQ10_3×Flag_TET1cdvector. ZF1CACTA1 or ZF2CACTA1 sequences were plant codon optimized andsynthesized by IDT and cloned in the HpaI restriction site in themodified pMDC123_3×Flag_TET1cd plasmid by In-Fusion (Takara) resultingin the creation of pMDC123 pUBQ10_ZF1CACTA1_3×Flag_TET1cd or pMDC123pUBQ10_ZF2CACTA1_3×Flag_TET1cd vectors.

The nucleotide sequences of pUBQ10::ZF1CACTA1_3×Flag_TET1CD andpUBQ10::ZF2CACTA1_3×Flag_TET1CD are presented in SEQ ID NO: 146 and SEQID NO: 147, respectively. This expression cassette contains a UBQ10promoter (SEQ ID NO: 22), the ZF1CACTA1 or ZF2CACTA1 DNA binding domainsthat targets the CACTA1 promoter (SEQ ID NO: 148 or SEQ ID NO: 149,respectively), a 3×Flag tag (SEQ ID NO: 24), the catalytic domain ofhuman TET1 (SEQ ID NO: 25), and an OCS terminator sequence (SEQ ID NO:26). pUBQ10::ZF1CACTA1_3×Flag_TET1CD and pUBQ10::ZF2CACTA1_3×Flag_TET1CDexpression cassettes encode the ZF1CACTA1_3×Flag_TET1CD (SEQ ID NO: 150)or ZF2CACTA1_3×Flag_TET1CD (SEQ ID NO: 151) fusion proteins,respectively. Polypeptides in each fusion protein include ZF1CACTA1 (SEQID NO: 152) or ZF2CACTA1 (SEQ ID NO: 153), 3×Flag (SEQ ID NO: 29), andhuman TET1-CD (SEQ ID NO: 30).

Plant Transformation

The transgenes above were introduced into Col-0 wild-type Arabidopsisplants using Agrobacterium-mediated transformation. T1 transgenic plantswere selected based on their resistance to BASTA.

Bisulfite Sequencing and Data Analysis

BS-Seq libraries were generated as previously reported (Cokus et al.,2008) and all libraries were sequenced using the HiSeq 2000 platformfollowing manufacturer instructions (Illumina) at a length of 50 bp.Bisulfite-Seq (BS-Seq) reads were aligned to the TAIR10 version of theArabidopsis thaliana reference genome using BS-seeker. For BS-Seq, up to2 mismatches were allowed and only uniquely mapped reads were used.

Metaplot of WGBS Data

Metaplots of WGBS data were made using custom Perl and R scripts.Regions of interest were broken into 50 bins while flanking 1 kb regionswere each broken into 25 bins. CG, CHG and CHH methylation levels ineach bin were then determined. Metaplots were then generated with R.

RNA-seq

Raw reads in qseq format obtained from the sequencer were firstconverted to fastq format with a customized perl script. Read qualitywas controlled with FastQC (worldwide web:bioinformatics.babraham.ac.uk/projects/fastqc). High quality reads werethen aligned to the TAIR10 reference genome using Tophat (Trapnell etal, 2009) (v 2.0.13) by using ‘-no-coverage-search’ option, allowing upto two mismatches and only keeping reads that mapped to one location.Essentially, reads were first mapped to the TAIR10 gene annotation withknown splice junctions. When reads did not map to the annotated genes,the reads were mapped to the TAIR10 genome. The number of reads mappingto genes were calculated by HTseq (Anders et al., 2015) (v 0.5.4) withdefault parameters. Expression levels were determined by RPKM (reads perkilobase of exons per million aligned reads) in R using customizedscripts.

Quantitative Real-Time PCR

To assess the level of CACTA1 gene expression, quantitative Real-timePCR (qPCR) was done using the oligos (5′-agtgtttcaatcaaggcgtttc-3′) (SEQID NO: 185) and (5′-cacccaatggaacaaagtgaac-3′) (SEQ ID NO: 186) toamplify a region of the CACTA1 gene. As an internal control, CACTA1expression values were normalized to the expression of the IPP2housekeeping gene collected from the same sample using oligos(5′-gtatgagttgcttctccagcaaag-3′) (SEQ ID NO: 187) and(5′-gaggatggctgcaacaagtgt-3′) (SEQ ID NO: 188).

Results

To test if ZF1CACTA1_TET1-CD or ZF2CACTA1_TET1-CD can reactivate theexpression of CACTA1, wild-type Col-0 plants were transformed witheither the pUBQ10::ZF1CACTA1_3×Flag_TET1CD or thepUBQ10::ZF2CACTA1_3×Flag_TET1CD construct described above. Expression ofCACTA1 was assayed by RNAseq of individual T1 transgenic plants. Theresults presented in FIG. 24 demonstrate that the catalytic domain ofhuman TET1 fused to either ZF1CACTA1 or ZF2CACTA1 can efficientlyactivate the expression of CACTA1.

To test if reactivation of CACTA1 expression in plants containing thepUBQ10::ZF1CACTA1_3×Flag_TET1CD or pUBQ10::ZF2CACTA1_3×Flag_TET1CDtransgenes described in FIG. 24 is due to the loss of methylation in theCACTA1 promoter, whole-genome BS-Seq experiments were conducted asdescribed above. The results, presented in FIG. 25 and FIG. 26 forplants containing either the pUBQ10::ZF1CACTA1_3×Flag_TET1CD transgeneor the pUBQ10::ZF2CACTA1_3×Flag_TET1CD transgene, show a loss ofmethylation in the CACTA1 promoter in both backgrounds.

To test the specificity of the targeted demethylation caused by theexpression of the pUBQ10::ZF1CACTA1_3×Flag_TET1CD or thepUBQ10::ZF2CACTA1_3×Flag_TET1CD transgene in T1 plants, genome-widemethylation levels and methylation levels over all protein coding genesor TEs was checked and compared with that of a Col-0 control plant. Theresults, presented in FIG. 27 and FIG. 28 , show that genome-wide DNAmethylation levels across the entire genome were slightly reduced ascompared to the Col-0 control in plants containing either thepUBQ10::ZF1CACTA1_3×Flag_TET1CD or the pUBQ10::ZF2CACTA1_3×Flag_TET1CDtransgene, indicating a partial non-specific global demethylation.Although this non-specific genome-wide demethylation had minor effects,it suggests that it is important to carefully screen through severaltransgenic lines to find ones with limited off target activity, whileretaining high levels of on target demethylation.

To test if the upregulation of CACTA1 gene expression in T2 backgroundsthat have either retained the ZF1CACTA1_TET1-CD transgene or had thetransgene segregated away is heritable, CACTA1 expression was checkedusing qPCR as described above. The results presented in FIG. 29 showthat in backgrounds that have retained the ZF1CACTA1_TET1-CD transgene,CACTA1 gene expression continues to be upregulated, while in backgroundsthat have lost the transgene, expression has been silenced to wild typelevels.

To test if the loss of methylation in the CACTA1 promoter in plantscontaining the pUBQ10::ZF1CACTA1_3×Flag_TET1CD or thepUBQ10::ZF2CACTA1_3×Flag_TET1CD transgene is heritable, whole-genomeBS-Seq experiments were conducted as described above on T2 plants thathave either retained the pUBQ10::ZF1CACTA1_3×Flag_TET1CD or thepUBQ10::ZF2CACTA1_3×Flag_TET1CD transgene, or had the transgenesegregated away. The results, presented in FIG. 30 and FIG. 31 forplants containing either the pUBQ10::ZF1CACTA1_3×Flag_TET1CD transgeneor the pUBQ10::ZF2CACTA1_3×Flag_TET1CD transgene, show a loss ofmethylation in the CACTA1 promoter in backgrounds that have retained thetransgene, while backgrounds that have lost the transgene show are-establishment of methylation levels similar to Col-0.

The re-establishment of methylation and silencing of CACTA1 after theremoval of the TET1CD transgene was is in contrast to FWA, wheremethylation loss was stable in the absence of the transgene, and withoutwishing to be bound by theory, is likely a consequence of the incompleteremoval of DNA methylation in the CACTA1 region that is then able toattract the methylation machinery through self-reinforcing mechanisms.The incomplete demethylation of CACTA1 likely leaves enough residualmethylation to attract the RdDM machinery, probably via recruitment ofPol V by the methyl DNA binding proteins SUVH2 and SUVH9 (Johnson L M,et al. (2014) SRA- and SET-domain-containing proteins link RNApolymerase V occupancy to DNA methylation. Nature 507(7490):124-8). Inaddition, the MET1 CG methyltransferase would likely perpetuate andpotentially amplify any remaining methylated CG sites. In this scenario,heritable demethylation might be more efficiently achieved by targetingthe TET1cd to multiple adjacent locations to achieve a more completedemethylation. Alternatively, and without wishing to be bound by theory,CACTA1 remethylation may occur because other methylated regions in thegenome with sequences homologous to CACTA1 may be able to efficientlytarget remethylation in trans via siRNAs. In this scenario it may beuseful to simultaneously target all homologous sequences fordemethylation to reduce the prevalence of remethylation by homologoussequences.

To test the specificity of the targeted demethylation caused by theexpression of the pUBQ10::ZF1CACTA1_3×Flag_TET1CD or thepUBQ10::ZF2CACTA1_3×Flag_TET1CD system, genome-wide methylation levelsand methylation levels over all protein coding genes or TEs of T2 plantsthat contained the transgene (=) or had it segregated away (−) werechecked and compared with that of a Col-0 control plant. The results,presented in FIG. 32 and FIG. 33 , show that genome-wide DNA methylationlevels across the entire genome were reduced as compared to the Col-0control in plants that had retained either thepUBQ10::ZF1CACTA1_3×Flag_TET1CD or the pUBQ10::ZF2CACTA1_3×Flag_TET1CDtransgene. However, in T2 plants that have had the transgene segregatedaway, genome-wide DNA methylation levels returned to levels similar tothat seen in the Col-0 control background.

Example 9: DNA-Binding Domain-Targeting of Demethylation Factor TET1(Catalytic Domain) to the ROS1 Locus in Arabidopsis

This Example demonstrates the targeting of the catalytic domain of aTET1 protein to the ROS1 locus in Arabidopsis using synthetic ZincFinger polypeptides.

Examples 1 and 8 describe the successful TET1 polypeptide targetingscheme using a synthetic zinc finger designed to target the FWA orCACTA1 loci, respectively. This Example describes a similar successfultargeting scheme using a synthetic zinc finger designed to target theROS1 locus.

Materials and Methods

Cloning of pUBQ10::ZF1ROS1_3×Flag_TET1CD

For this purpose, a modified pMDC123 plasmid (Curtis et al, 2003, PlantPhys) was created first, containing 1990 bp of the promoter region ofthe Arabidopsis UBQ10 gene upstream of a cassette containing a HpaIrestriction site and a 3×Flag tag creating a pMDC123 pUBQ10_3×Flagvector. Both UBQ10 promoter and 3×Flag are upstream of the gatewaycassette (Invitrogen) present in the original pMDC123 plasmid. TheTET1cd was delivered into the modified pMDC123 by an LR reaction(Invitrogen), creating an in-frame fusion of the TET1cd cDNA with theupstream 3×Flag cassette resulting in a pMDC123 pUBQ10_3×Flag_TET1cdvector. The ZF1ROS1 sequences was plant codon optimized and synthesizedby IDT and cloned in the HpaI restriction site in the modifiedpMDC123_3×Flag_TET1cd plasmid by In-Fusion (Takara) creating the pMDC123pUBQ10_ZF1ROS1_3×Flag_TET1cd vector.

The nucleotide sequence of pUBQ10::ZF1ROS1_3×Flag_TET1CD is presented inSEQ ID NO: 154. This expression cassette contains a UBQ10 promoter (SEQID NO: 22), the ZF1ROS1 DNA binding domain that targets the ROS1promoter (SEQ ID NO: 155), a 3×Flag tag (SEQ ID NO: 24), the catalyticdomain of human TET1 (SEQ ID NO: 25), and an OCS terminator sequence(SEQ ID NO: 26). pUBQ10::ZF1ROS1_3×Flag_TET1CD expression cassetteencodes the ZF1ROS1_3×Flag_TET1CD (SEQ ID NO: 156) fusion protein.Polypeptides in this fusion protein include ZF1ROS1 (SEQ ID NO: 157),3×Flag (SEQ ID NO: 29), and human TET1-CD (SEQ ID NO: 30).

Plant Transformation

The construct above was introduced into Col-0 wild-type Arabidopsisplants using Agrobacterium-mediated transformation. T1 transgenic plantswere selected based on their resistance to BASTA.

Bisulfite Sequencing and Data Analysis

BS-Seq libraries were generated as previously reported (Cokus et al.,2008) and all libraries were sequenced using the HiSeq 2000 platformfollowing manufacturer instructions (Illumina) at a length of 50 bp.Bisulfite-Seq (BS-Seq) reads were aligned to the TAIR10 version of theArabidopsis thaliana reference genome using BS-seeker. For BS-Seq, up to2 mismatches were allowed and only uniquely mapped reads were used.

Metaplot of WGBS Data

Metaplots of WGBS data were made using custom Perl and R scripts.Regions of interest were broken into 50 bins while flanking 1 kb regionswere each broken into 25 bins. CG, CHG and CHH methylation levels ineach bin were then determined. Metaplots were then generated with R.

RNA-seq

Raw reads in qseq format obtained from the sequencer were firstconverted to fastq format with a customized perl script. Read qualitywas controlled with FastQC (worldwide web:bioinformatics.babraham.ac.uk/projects/fastqc). High quality reads werethen aligned to the TAIR10 reference genome using Tophat (Trapnell etal, 2009) (v 2.0.13) by using ‘-no-coverage-search’ option, allowing upto two mismatches and only keeping reads that mapped to one location.Essentially, reads were first mapped to the TAIR10 gene annotation withknown splice junctions. When reads did not map to the annotated genes,the reads were mapped to the TAIR10 genome. The number of reads mappingto genes were calculated by HTseq (Anders et al., 2015) (v 0.5.4) withdefault parameters. Expression levels were determined by RPKM (reads perkilobase of exons per million aligned reads) in R using customizedscripts.

Results

ROS1 is an example of a gene whose expression depends on DNAmethylation. Methylation mutants with lower DNA methylation in the ROS1promoter show reduced ROS1 expression (Lei M, et al. (2015) Regulatorylink between DNA methylation and active demethylation in Arabidopsis.Proc Natl Acad Sci 112(11):3553-3557; Williams B P, Pignatta D, HenikoffS, Gehring M (2015) Methylation-Sensitive Expression of a DNADemethylase Gene Serves As an Epigenetic Rheostat. PLoS Genet11(3):1-18). To test if ZF1ROS1_TET1-CD can repress the expression ofROS1, wild-type Col-0 plants were transformed with thepUBQ10::ZF1ROS1_3×Flag_TET1CD construct described above. Expression ofROS1 was assayed in one wild-type Col-0 plant and two individual T1transgenic plants by RNA-seq (FIG. 34 ). The results presented in FIG.34 demonstrate that the catalytic domain of human TET1 fused to a zincfinger that targets the ROS1 locus can efficiently repress theexpression of ROS1.

To test if the repression of ROS1 expression in plants containing thepUBQ10::ZF1ROS1_3×Flag_TET1CD transgene described in FIG. 34 is due tothe loss of methylation in the ROS1 promoter, whole-genome BS-Seqexperiments were conducted as described above. The results for plantscontaining the pUBQ10::ZF1ROS1_3×Flag_TET1CD construct, presented inFIG. 35 and FIG. 36 , show a loss of methylation in the ROS1 promoter inbackgrounds that contain the pUBQ10::ZF1ROS1_3×Flag_TET1CD construct.Line 2, which showed the most demethylation (FIG. 35 and FIG. 36 ), alsoshowed the most RNA downregulation (FIG. 34 ). This result is consistentwith the aforementioned two studies which suggested that ROS1 expressionis controlled by its methylation status.

To test the specificity of the targeted demethylation caused by theexpression of the pUBQ10::ZF1ROS1_3×Flag_TET1CD transgene in twoindependent T1 plants, genome-wide methylation levels and methylationlevels over all protein coding genes or TEs was analyzed and comparedwith that of a Col-0 control plant. The results, presented in FIG. 37and FIG. 38 , show that genome-wide DNA methylation levels of theZF1ROS1-TET1cd-2 T1 plant across the entire genome were very slightlyreduced as compared to the Col-0 control, indicating a partialnon-specific global demethylation, while the methylation levels of theZF1ROS1-TET1cd-1 line were very similar to wild type. Similarly toExample 8, this underscores the need to choose lines that show minimalgenome-wide effects, while showing high on target activity.

Example 10: Heritability and Specificity of the SunTag-Based Targetingof TET1 to the FWA Locus

In the present Example, Applicant provides additional evidence that theSunTag targeting scheme described in Example 5 is able target a TET1polypeptide to the FWA locus in Arabidopsis using the CRISPR-CAS9system. The heritability and specificity of this SunTag targeting schemeis also evaluated.

Materials and Methods

Constructs

Construction of thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSwas described in Example 5.

Construction of thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSwas described in Example 5.

Plant Transformation and Flowering Time Measurement

The constructs described above were transformed into Col-0 wild-typeplants using Agrobacterium-mediated genetic transformation (after theconstruct was transformed into Agrobacterium). Among a segregatingpopulation of T2 plants carrying either thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSor thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene, flowering time was measured and compared to early-floweringwild-type Col-0 and late-flowering fwa-4 plants. Flowering time wasmeasured by counting the total number of leaves (rosette and cauline) ofeach individual plant.

Bisulfite Sequencing and Data Analysis

Whole genome bisulfite sequencing (BS-Seq) libraries were generated aspreviously reported (Cokus et al., 2008) and all libraries weresequenced using the HiSeq 4000 platform following manufacturerinstructions (Illumina) at a length of 50 bp. BS-Seq reads were alignedto the TAIR10 version of the Arabidopsis thaliana reference genome usingBS-map-2.74. For BS-Seq, up to 2 mismatches were allowed and onlyuniquely mapped reads were used.

Metaplot of WGBS Data

Metaplots of WGBS data were made using custom Perl and R scripts.Regions of interest were broken into 50 bins while flanking 1 kb regionswere each broken into 25 bins. CG, CHG and CHH methylation levels ineach bin were then determined. Metaplots were then generated with R.

RNA-seq

Raw reads in qseq format obtained from the sequencer were firstconverted to fastq format with a customized perl script. Read qualitywas controlled with FastQC (worldwide web:bioinformatics.babraham.ac.uk/projects/fastqc). High quality reads werethen aligned to the TAIR10 reference genome using Tophat (Trapnell etal, 2009) (v 2.0.13) by using ‘-no-coverage-search’ option, allowing upto two mismatches and only keeping reads that mapped to one location.Essentially, reads were first mapped to the TAIR10 gene annotation withknown splice junctions. When reads did not map to the annotated genes,the reads were mapped to the TAIR10 genome. The number of reads mappingto genes were calculated by HTseq (Anders et al., 2015) (v 0.5.4) withdefault parameters. Expression levels were determined by RPKM (reads perkilobase of exons per million aligned reads) in R using customizedscripts.

Results

To test if the late flowering phenotype of a late flowering plantcontaining thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSconstruct or thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSconstruct originally described in Table 5A was due to the activation ofFWA expression, RNA-seq was performed as described above for the T1lines containing thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCStransgene or thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene. The results presented in FIG. 39 show that FWA expression wasupregulated in the transgenic lines tested as compared to expression inCol-0 wild type plants, similarly to what was seen in the late floweringfwa-4 epiallele plant.

To test if the late flowering phenotype of a late flowering plantcontaining thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSconstruct originally described in Table 5A is due to the loss ofmethylation in the FWA promoter, whole-genome BS-Seq experiments wereconducted as described above. The results, presented in FIG. 40 and FIG.41 , show a loss of methylation in the FWA promoter in the plants thatcontains thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCStransgene and that this demethylation was specific to the FWA promoter.

To test the heritability of the late flowering phenotype observed inExample 5 in plants containing either thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSor thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene, flowering time of a segregating population of T2 plants wasassayed. The results, presented in FIG. 42 , show that all plants in theT2 generation arising from T1 plants containing either thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSor thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene also displayed a late flowering phenotype similar to what isseen in the fwa-4 epiallele plants. Thus, even though these T2 plantswere segregating 3:1 for the TET1CD containing transgenes, all plantsretained the late flowering phenotype, indicative of FWA activation.

To test if the late flowering phenotype observed in T2 plants thateither contain thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSor thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene, or where the transgene had been segregated away, is due to aloss of methylation at the FWA promoter, whole genome BS-Seq experimentswere conducted on individual plants that had retained or lost thetransgene in the T2 generation as described above. The results,presented in FIG. 43 , FIG. 44 , and FIG. 45 show a loss of methylationin the FWA promoter in backgrounds that have either retained thetransgene or have had the transgene segregated away. Thus, TET1CDmediated demethylation of FWA is stable in the absence of the transgene,showing that the SunTag TET1CD system can cause heritable changes in DNAmethylation. This suggests that the SunTag TET1CD system can potentiallybe used to create new stable epialleles not found in nature.

To test the specificity of the targeted demethylation caused by theexpression of either thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSor thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSsystems in T1 backgrounds and T2 backgrounds that retained the transgeneor had it segregated away, genome-wide methylation was checked andcompared with that of a Col-0 control plant. The results presented inFIG. 46 , FIG. 47 , FIG. 48 , and FIG. 49 show that genome-wide DNAmethylation levels were similar between T1 and T2 plants that containeither thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene, T2 plants where either thegRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorgRNA4_U6_NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene had been segregated away in the T2, and Col-0 control plants.Thus, the demethylation of FWA in the SunTag TET1CD system was veryspecific.

The results presented in this Example demonstrate that the demethylationcaused by the targeting of the TET1 catalytic domain using the SunTagtargeting scheme to the FWA locus is specific and heritable. Thespecificity of this system is important because when using this tool tostudy locus specific DNA methylation changes, avoiding off targeteffects reduces any indirect effects on the expression of a locus ofinterest.

Example 11: Heritability and Specificity of the DNA Binding DomainTargeting of TET1 to the FWA Locus

In the present Example, Applicant evaluated the heritability andspecificity of the Zinc Finger (ZF) targeting scheme that targets theTET1 polypeptide to the FWA locus in Arabidopsis previously described inExample 1.

Materials and Methods

Construction of: pUBQ10_ZF108_3×Flag_YPet

For this purpose, a modified pMDC123 plasmid (Curtis et al, 2003, PlantPhys) was created, containing 1990 bp of the promoter region of theArabidopsis UBQ10 gene upstream of the BLRP_ZF108_3×Flag cassette. BothUBQ10 promoter and BLRP_ZF108_3×Flag are upstream of the gatewaycassette (Invitrogen) present in the original pMDC123 plasmid. YPet wasamplified from a YPet containing plasmid and cloned into the pENTR/Dplasmid and then delivered to the modified pMDC123 by an LR reaction.The nucleotide sequence of pUBQ10::ZF108_3×Flag_YPet is presented in SEQID NO: 158. This expression cassette contains a UBQ10 promoter (SEQ IDNO: 22), the ZF108 DNA-binding domain that targets the FWA promoter (SEQID NO: 23), a 3×Flag tag (SEQ ID NO: 24), the YPet expression domain(SEQ ID NO: 159), and an OCS terminator sequence (SEQ ID NO: 26). ThepUBQ10::ZF108_3×Flag_YPet expression cassette encodes theZF108_3×Flag_YPet fusion protein, whose amino acid sequence set forth inSEQ ID NO: 160. Polypeptides in the fusion protein include ZF108 (SEQ IDNO: 28), 3×Flag (SEQ ID NO: 29), and YPet (SEQ ID NO: 161).

Flowering Time Measurement

In plants of the T3 generation that have retained thepUBQ10::ZF108_3×Flag_TET1-CD transgene described in Example 1, or havehad the transgene segregated away, flowering time was measured andcompared to early-flowering wild-type Col-0, homozygous T3 plantscarrying the pUBQ10_ZF108_3×Flag_YPet transgene, and late-floweringfwa-4 plants. Flowering time was measured by counting the total numberof leaves (rosette and cauline) of each individual plant.

Bisulfite Sequencing and Data Analysis

Whole genome bisulfite sequencing (BS-Seq) libraries were generated aspreviously reported (Cokus et al., 2008) and all libraries weresequenced using the HiSeq 2000 platform following manufacturerinstructions (Illumina) at a length of 50 bp. BS-Seq reads were alignedto the TAIR10 version of the Arabidopsis thaliana reference genome usingBS-map-2.74. For BS-Seq, up to 2 mismatches were allowed and onlyuniquely mapped reads were used.

Metaplot of WGBS Data

Metaplots of WGBS data were made using custom Perl and R scripts.Regions of interest were broken into 50 bins while flanking 1 kb regionswere each broken into 25 bins. CG, CHG and CHH methylation levels ineach bin were then determined. Metaplots were then generated with R.

RNA-seq

Raw reads in qseq format obtained from the sequencer were firstconverted to fastq format with a customized perl script. Read qualitywas controlled with FastQC (worldwide web:bioinformatics.babraham.ac.uk/projects/fastqc). High quality reads werethen aligned to the TAIR10 reference genome using Tophat (Trapnell etal, 2009) (v 2.0.13) by using ‘-no-coverage-search’ option, allowing upto two mismatches and only keeping reads that mapped to one location.Essentially, reads were first mapped to the TAIR10 gene annotation withknown splice junctions. When reads did not map to the annotated genes,the reads were mapped to the TAIR10 genome. The number of reads mappingto genes were calculated by HTseq (Anders et al., 2015) (v 0.5.4) withdefault parameters. Expression levels were determined by RPKM (reads perkilobase of exons per million aligned reads) in R using customizedscripts.

Results

As previously shown in Example 1, T1 plants containing thepUBQ10::ZF108_3×Flag_TET1-CD transgene were late flowering like fwa-4plants as compared to Col-0 controls (FIG. 50A). To test if the lateflowering phenotypes observed in plants containing thepUBQ10::ZF108_3×Flag_TET1-CD transgene in Example 1 was heritable in thenext generation, flowering time of populations of T3 plants that hadeither retained the pUBQ10::ZF108_3×Flag_TET1-CD transgene or plantswhere the pUBQ10::ZF108_3×Flag_TET1-CD transgene was segregated away inthe T2 were assayed along with Col-0, fwa-4 and T3 plants containing thepUBQ10_ZF108_3×Flag_YPet control transgene. The results, presented inFIG. 50B, show that all plants that have either retained thepUBQ10::ZF108_3×Flag_TET1-CD transgene or where thepUBQ10::ZF108_3×Flag_TET1-CD transgene was segregated away in the T2,showed a later flowering phenotype. This demonstrated that the lateflowering phenotype caused by the TET1-CD is heritable even in theabsence of the TET1-CD transgene. In addition, control plants expressinga fusion of ZF108 to the fluorescent protein YPet (ZF108-YPet) did notshow any effect on flowering time, indicating that the late floweringphenotype observed is not simply a consequence of ZF108 binding to theFWA promoter (FIG. 50B).

To test if the observed late flowering phenotype in T1 plants containingthe pUBQ10::ZF108_3×Flag_TET1-CD transgene was due to FWA activation,RNA-seq was performed with one Col-0, one fwa-4, and four independent T1lines containing the pUBQ10::ZF108_3×Flag_TET1-CD transgene. FIG. 51Ashows that FWA is activated in plants containing the transgene to asimilar level observed in fwa-4 plants. RNA-seq was also performed withfour biological replicates from two independent T3 lines containing thepUBQ10::ZF108_3×Flag_TET1-CD transgene, four biological replicates fromtwo independent T3 lines containing pUBQ10::ZF108_3×Flag_YPet, and fourbiological replicates of Col-0 control plants. The results presented inFIG. 51B show that FWA was upregulated in allpUBQ10::ZF108_3×Flag_TET1-CD plants tested, but not in thepUBQ10::ZF108_3×Flag_YPet or Col-0 plants. These results, in addition tothose shown in FIG. 5 of Example 1, demonstrate that activation of FWAcaused by the specific targeting of the TET1 catalytic domain to agenomic region can be heritable over multiple generations. In addition,control plants expressing pUBQ10::ZF108_3×Flag_YPet did not show anyeffect on FWA expression, showing that the FWA overexpression phenotypeobserved in pUBQ10::ZF108_3×Flag_TET1-CD plants is not simply aconsequence of ZF108 binding to the FWA promoter. RNA-seq data showedvery few additional changes and revealed FWA as the most upregulatedgene in the ZF108-TET1cd lines as compared to ZF108-YPet control lines(FIG. 52 ). These results demonstrate successful removal of methylationat the FWA promoter and activation of FWA expression and, importantly,very few off-target effects due to ZF108-TET1cd expression.

To test if the late flowering phenotype observed in the T3 plants wasdue to a loss of methylation at the FWA promoter, whole genome BS-Seqexperiments were conducted on individual plants that had retained orlost the transgene as described above. The results, presented in FIG. 53, FIG. 54 , and FIG. 55 show that loss of methylation caused by thespecific targeting of the TET1 catalytic domain to a genomic region canbe heritable over multiple generations even in plants that have had thepUBQ10::ZF108_3×Flag_TET1-CD transgene segregated away. These resultsalso show that methylation in regions adjacent to FWA showed very littlechange in methylation, showing that targeting of the TET1-CD to FWAcauses highly localized and precise demethylation.

To test the specificity of the targeted demethylation caused by theexpression of the pUBQ10::ZF108_3×Flag_TET1-CD system in T1 plants, T3plants that retained the transgene, or T3 plants that had the transgenesegregated away, genome-wide methylation was analyzed and compared withthat of a Col-0 control plant. The results presented in FIG. 56 , FIG.57 , FIG. 58 , and FIG. 59 show that genome-wide DNA methylation levelswere similar between all backgrounds examined. In T3 plants that hadretained or lost the transgene, methylation levels over protein codinggenes and transposable elements were also analyzed. The resultspresented in FIG. 60 show that over protein coding genes andtransposable elements methylation levels were similar among allbackgrounds examined. These data show that expression of the TET1-CD inthese plants showed very little genome-wide effects on methylationlevels.

The results presented in this example demonstrate that the demethylationcaused by the targeting of the TET1 catalytic domain using the ZFtargeting scheme to the FWA locus is highly specific and heritable.Thus, specific and highly efficient ZF proteins can be designed fortargeted demethylation of genomic regions of interest, for both researchand agricultural purposes.

Example 12: SunTag Control Transgenes that are not Targeted to aSpecific Locus

In the present Example, Applicant used the SunTag targeting schemewithout a specific guide RNA to demonstrate that the targeting ofdemethylation by TET1-CD requires a specific guide RNA and is thereforenot caused by non-specific expression of the TET1-CD.

Example 4 describes a SunTag-based targeting scheme to target a TET1catalytic polypeptide to a target nucleic acid. This Example describes aSunTag targeting scheme in which a TET1 polypeptide was not targeted toany locus in Arabidopsis using the CRISPR-CAS9 system. A schematic ofthe targeting system is presented in FIG. 61 .

Materials and Methods

Construction of:

-   -   NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS        and        NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS

For this purpose, a dCAS9_1×HA_3×NLS_10×GCN4 that contains a 22aa spacerbetween epitopes (dCAS9_1×HA_3×NLS_10×GCN422aa) and adCAS9_1×HA_3×NLS_10×GCN4 that contains a 14aa spacer between epitopes(dCAS9_1×HA_3×NLS_10×GCN414aa) was created through a combination of genesynthesis and the utilization of plasmids from Addgene, and separatelycloned into a modified pMTN3164 plasmid downstream of a fragmentcontaining 1994 bp of the promoter region of the Arabidopsis UBQ10 genefollowed by an omega RBC translational enhancer and upstream of an OCSterminator creating pMTN3164 UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS andpMTN3164 UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS. An insulator sequencefollowed by a second fragment containing 1994 bp of the promoter regionof the Arabidopsis UBQ10 gene was then cloned upstream of pMTN3164UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCS and pMTN3164UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS such that transcription ofdownstream targets resulting from this second UBQ10 promoter would occuropposite the dCAS9_1×HA_3×NLS_10×GCN422aa ordCAS9_1×HA_3×NLS_10×GCN414aa transcription. AscFv_sfGFP_1×HA_2×NLS_TET1CD sequence created through a combination ofgene synthesis and the utilization of plasmids from Addgene was thencloned downstream of the second UBQ10 promoter in both vectors creatingpMTN3164TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSand pMTN3164TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCS.A NOS terminator was then cloned downstream of TET1cd in bothTET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSandTET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSconstructs creating pMTN3164NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSand pMTN3164NOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSconstructs.

The expression cassette ofNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSandNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSdiffer only in the 10×GCN4 sequence. These vectors contain a number offeatures. The nucleotide sequences ofNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSandNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSexpression cassettes are presented in SEQ ID NO: 162 and SEQ ID NO: 163,respectively. These cassettes are described as single cassettes, butcontain different expression regions: (1) one that encodes thedCAS9-10×GCN4 fusion protein and (2) one that encodes thescFv-sfGFP-TET1-CD fusion protein. The cassette includes, a UBQ10promoter (SEQ ID NO: 108), Omega RBC (SEQ ID NO: 109), dCAS9 (SEQ ID NO:110), 1×HA (SEQ ID NO: 111), 3×NLS (SEQ ID NO: 112), 2×NLS (SEQ ID NO:113), linkers (SEQ ID NO: 114), 10×GCN422aa (SEQ ID NO: 115) or10×GCN414aa (SEQ ID NO: 116), OCS terminator (SEQ ID NO: 117), insulator(SEQ ID NO: 118), scFv (SEQ ID NO: 119), sfGFP (SEQ ID NO: 120), TET1-CD(SEQ ID NO: 121), and NOS terminator (SEQ ID NO: 122).

The amino acid sequence of the polypeptide fusion ofdCAS9_1×HA_3×NLS_10×GCN422aa is presented in SEQ ID NO: 123 and theamino acid sequence of the polypeptide fusion ofdCAS9_1×HA_3×NLS_10×GCN414aa is presented in SEQ ID NO: 124. Relevantamino acid sequences present in these fusion proteins include, forexample: dCAS9 (SEQ ID NO: 125), 1×HA (SEQ ID NO: 126), 3×NLS (SEQ IDNO: 127), linker (SEQ ID NO: 128), and 10×GCN422aa (SEQ ID NO: 129) or10×GCN414aa (SEQ ID NO: 130).

The amino acid sequence of the polypeptide fusion ofscFv_sfGFP_1×HA_2×NLS_TET1CD is presented in SEQ ID NO: 131 and isidentical in bothNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSandNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCSvectors. Relevant amino acid sequences present in this fusion proteininclude, for example: scFv (SEQ ID NO: 132), sfGFP (SEQ ID NO: 133),1×HA (SEQ ID NO: 134), 2×NLS (SEQ ID NO: 135), Linkers (SEQ ID NO: 136),and TET1-CD (SEQ ID NO: 137).

Plant Transformation and Flowering Time Measurement

The constructs described above were transformed into Col-0 wild-typeplants using Agrobacterium-mediated genetic transformation (after theconstruct was transformed into Agrobacterium). This process involvestransforming plants via floral dip using methods well known in the art.Progeny of transformed plants (TIs) were screened for Hygromycinresistance. Among the Hygromycin-resistant transgenic plants, floweringtime was measured and compared to early-flowering wild-type Col-0 andlate-flowering fwa-4 plants. Flowering time was measured by counting thetotal number of leaves (rosette and cauline) of each individual plant.

Bisulfite Sequencing and Data Analysis

Whole genome bisulfite sequencing (BS-Seq) libraries were generated aspreviously reported (Cokus et al., 2008) and all libraries weresequenced using the HiSeq 4000 platform following manufacturerinstructions (Illumina) at a length of 50 bp. BS-Seq reads were alignedto the TAIR10 version of the Arabidopsis thaliana reference genome usingBS-map-2.74. For BS-Seq, up to 2 mismatches were allowed and onlyuniquely mapped reads were used.

Metaplot of WGBS Data

Metaplots of WGBS data were made using custom Perl and R scripts.Regions of interest were broken into 50 bins while flanking 1 kb regionswere each broken into 25 bins. CG, CHG and CHH methylation levels ineach bin were then determined. Metaplots were then generated with R.

Results

To test ifNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCScan trigger demethylation and reactivate FWA expression, wild-type Col-0plants were transformed with theNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene described above. Flowering time of T1 transgenic plants wasassayed. The results, presented in FIG. 62 , show that all T1 plantscontaining either theNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene displayed an early flowering phenotype similar to that ofCol-0 wild type plants. Thus, even though these T1 plants contained theNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene, effects on flowering time were not observed, ruling out thepossibility of non-specific FWA reactivation due to these transgeneswhen a gRNA is not present.

To test if the early flowering plants containing theNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene described in FIG. 61 show any loss of methylation in the FWApromoter or the CACTA1 promoter (described in Example 6), whole-genomeBS-Seq experiments were conducted as described above. The resultspresented in FIG. 63 show that plants containing theNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene show a level of methylation in the FWA promoter similar tothat seen in the Col-0 wild type background. The results presented inFIG. 64 show that plants containing theNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCStransgene show a level of methylation in the CACTA1 promoter similar tothat seen in the Col-0 wild type background.

To test if the plants containing theNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene showed any genome-wide changes in CG, CHG or CHH methylationlevels caused by the expression of theNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN422aa_OCSorNOS_TET1CD_2×NLS_1×HA__sfGFP_scFv_UBQ10_INSULATOR_UBQ10_dCAS9_1×HA_3×NLS_10×GCN414aa_OCStransgene, genome-wide methylation levels were checked and compared withthat of a Col-0 control plant. The results presented in FIG. 65 showthat genome-wide DNA methylation levels across the entire genome weresimilar among all backgrounds examined.

The results in this Example show that that expression of SunTag TET1-CDconstructs without any specific guide RNAs show little effect on DNAmethylation at specific loci or in the genome in general. These resultsfurther underscore that the SunTag TET1-CD systems is highly specificfor the targeted locus. This SunTag system can therefore be used tospecifically target single loci for targeted DNA methylation, or amultiplexing strategy can be taken to specifically and efficientlytarget multiple loci simultaneously.

Example 13: Targeting the Catalytic Domain of a TET2 or TET3 Polypeptideto a Target Nucleic Acid

This Example describes exemplary protocols for targeting the catalyticdomain of a TET2 polypeptide or a TET3 polypeptide to a target nucleicacid to induce demethylation of the target nucleic acid.

Materials and Methods for this targeting are generally analogous tothose described in prior examples. For DNA-binding domain basedtargeting, the methods outlined in Example 8 may be applied. For SunTagbased targeting, the methods outlined in Example 6 may be applied. Thecatalytic domain of TET1 (TET1-CD) may be replaced with the catalyticdomain of TET2 (e.g. SEQ ID NO: 192) or the catalytic domain of TET3(e.g. SEQ ID NO: 194).

Following vector construction and plant transformation, an exemplarytarget nucleic acid (e.g. FWA) may be assayed via expression analysissuch as qPCR to evaluate the level of expression of the target nucleicacid. Bisulfite sequencing may be used to probe the methylation statusof the target nucleic acid.

It is expected that targeting the catalytic domain of TET2 or TET3 to atarget nucleic acid in plants will result in decreased methylation ofthe target nucleic acid.

REFERENCES

-   Tet proteins can convert 5-methylcytosine to 5-formylcytosine and    5-carboxylcytosine. Ito S, Shen L, Dai Q, Wu S C, Collins L B,    Swenberg J A, He C, Zhang Y. Science. 2011 Sep. 2; 333(6047):1300-3.    doi: 10.1126/science.1210597. Epub 2011 Jul. 21.-   Hydroxylation of 5-methylcytosine by TET1 promotes active DNA    demethylation in the adult brain. Guo J U, Su Y, Zhong C, Ming G L,    Song H. Cell. 2011 Apr. 29; 145(3):423-34. doi:    10.1016/j.cell.2011.03.022. Epub 2011 Apr. 14.-   SRA- and SET-domain-containing proteins link RNA polymerase V    occupancy to DNA methylation. Johnson L M, Du J, Hale C J, Bischof    S, Feng S, Chodavarapu R K, Zhong X, Marson G, Pellegrini M, Segal D    J, Patel D J, Jacobsen S E. Nature. 2014 Mar. 6; 507(7490):124-8.    doi: 10.1038/nature12931. Epub 2014 Jan. 22.-   A CRISPR-based approach for targeted DNA demethylation. Xu X, Tao Y,    Gao X, Zhang L, Li X, Zou W, Ruan K, Wang F, Xu G L, Hu R. Cell    Discov. 2016 May 3; 2:16009. doi: 10.1038/celldisc.2016.9.    eCollection 2016.-   Editing DNA Methylation in the Mammalian Genome. Liu X S, Wu H, Ji    X, Stelzer Y, Wu X, Czauderna S, Shu J, Dadon D, Young R A,    Jaenisch R. Cell. 2016 Sep. 22; 167(1):233-247.e17. doi:    10.1016/j.cell.2016.08.056.-   Inheritable Silencing of Endogenous Genes by Hit-and-Run Targeted    Epigenetic Editing. Amabile A, Migliara A, Capasso P, Biffi M,    Cittaro D, Naldini L, Lombardo A. Cell. 2016 Sep. 22;    167(1):219-232.e14. doi: 10.1016/j.cell.2016.09.006.-   Targeted DNA demethylation in vivo using dCas9-peptide repeat and    scFv-TET1 catalytic domain fusions. Morita S, Noguchi H, Horii T,    Nakabayashi K, Kimura M, Okamura K, Sakai A, Nakashima H, Hata K,    Nakashima K, Hatada I. Nat Biotechnol. 2016 Aug. 29. doi:    10.1038/nbt.3658-   CRISPR-dCas9 mediated TET1 targeting for selective DNA demethylation    at BRCA1 promoter. Choudhury S R, Cui Y, Lubecka K, Stefanska B,    Irudayaraj J. Oncotarget. 2016 Jun. 23. doi:    10.18632/oncotarget.10234. [Epub ahead of print]-   Induced DNA demethylation by targeting Ten-Eleven Translocation 2 to    the human ICAM-1 promoter. Chen H, Kazemier H G, de Groote M L,    Ruiters M H, Xu G L, Rots M G. Nucleic Acids Res. 2014 February;    42(3):1563-74. doi: 10.1093/nar/gkt1019. Epub 2013 Nov. 4.-   Targeted DNA demethylation and activation of endogenous genes using    programmable TALE-TET1 fusion proteins. Maeder M L, Angstman J F,    Richardson M E, Linder S J, Cascio V M, Tsai S Q, Ho Q H, Sander J    D, Reyon D, Bernstein B E, Costello J F, Wilkinson M F, Joung J K.    Nat Biotechnol. 2013 December; 31(12):1137-42. doi:    10.1038/nbt.2726. Epub 2013 Oct. 9.-   Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering    splice junctions with RNA-Seq. Bioinformatics (Oxford, England) 25,    1105-1111 (2009).-   Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work    with high-throughput sequencing data. Bioinformatics (Oxford,    England) 31, 166-169 (2015).-   Xie, X et al, Boosting CRISPR/Cas9 multiplex editing capability with    the endogenous tRNA-processing system. 2015, Proc Natl Acad Sci USA.    2015 Mar. 17; 112(11):3570-5-   Pastor W. A., Aravind L., Rao A. TETonic shift: biological roles of    TET proteins in DNA demethylation and transcription. Nat Rev Mol    Cell Biol. 14, 341-356 (2013).-   Ito S., D'Alessio A. C., Taranova O. V., Hong K., Sowers L. C.,    Zhang Y. Role of Tet proteins in 5mC to 5hmC conversion, ES-cell    self-renewal and inner cell mass specification. Nature 466,    1129-1133 (2010).-   Hashimoto et al, 2014 Feb. 20: 506(7488):391-5-   Ito et al, Nature, 2010, Aug. 26: 466(7310): 1129-1133

1-38. (canceled)
 39. A method for reducing methylation of a targetnucleic acid in a plant, comprising: (a) providing a plant comprising arecombinant methylcytosine dioxygenase polypeptide that comprises theamino acid sequence of SEQ ID NO: 189 and that is capable of beingtargeted to a target nucleic acid; and (b) growing the plant underconditions whereby the recombinant polypeptide is targeted to the targetnucleic acid, thereby reducing methylation of the target nucleic acid.40. The method of claim 39, wherein the recombinant methylcytosinedioxygenase polypeptide is targeted to the target nucleic acid via aSunTag targeting system. 41-45: (canceled)
 46. The method of claim 39,wherein the recombinant methylcytosine dioxygenase polypeptide istargeted to the target nucleic acid via a DNA-binding domain.
 47. Themethod of claim 39, wherein the methylcytosine dioxygenase polypeptideis a TET polypeptide.
 48. The method of claim 47, wherein the TETpolypeptide is a TET1 polypeptide
 49. The method of claim 48, whereinthe TET1 polypeptide comprises the catalytic domain of TET1.
 50. Themethod of claim 49, wherein the TET1 polypeptide comprises an amino acidsequence that is at least 80% identical to SEQ ID NO:
 8. 51-52.(canceled)
 53. The method of claim 39, wherein expression of the targetnucleic acid is activated as compared to a corresponding control nucleicacid.
 54. A recombinant nucleic acid comprising a plant promoter andwhich encodes a recombinant polypeptide comprising a DNA-binding domainand a methylcytosine dioxygenase polypeptide that comprises the aminoacid sequence of SEQ ID NO:
 189. 55. An expression vector comprising therecombinant nucleic acid of claim
 54. 56. A host cell comprising theexpression vector of claim
 55. 57. A recombinant plant comprising therecombinant nucleic acid of claim
 54. 58. A plant having reducedmethylation of a target nucleic acid as a consequence of the method ofclaim
 39. 59. A progeny plant of the plant of claim
 58. 60. The progenyplant of claim 59, wherein the progeny plant has reduced methylation ofthe target nucleic acid and does not comprise the recombinantmethylcytosine dioxygenase polypeptide.
 61. (canceled)
 62. A recombinantvector comprising: a first nucleic acid sequence comprising a plantpromoter and that encodes a recombinant polypeptide comprising anuclease-deficient CAS9 polypeptide (dCAS9) or fragment thereof and amultimerized epitope; a second nucleic acid sequence comprising a plantpromoter and that encodes a recombinant polypeptide comprising amethylcytosine dioxygenase polypeptide that comprises the amino acidsequence of SEQ ID NO: 189, and an affinity polypeptide thatspecifically binds to the epitope; and a third nucleic acid sequencecomprising a promoter and that encodes a crRNA and a tracrRNA, orfusions thereof.
 63. A host cell comprising the vector of claim
 62. 64.A recombinant plant comprising the vector of claim
 63. 65-67. (canceled)